Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
Proc Natl Acad Sci U S A ; 118(38)2021 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-34526403

RESUMO

The spleen contains phenotypically and functionally distinct conventional dendritic cell (cDC) subpopulations, termed cDC1 and cDC2, which each can be divided into several smaller and less well-characterized subsets. Despite advances in understanding the complexity of cDC ontogeny by transcriptional programming, the significance of posttranslational modifications in controlling tissue-specific cDC subset immunobiology remains elusive. Here, we identified the cell-surface-expressed A-disintegrin-and-metalloproteinase 10 (ADAM10) as an essential regulator of cDC1 and cDC2 homeostasis in the splenic marginal zone (MZ). Mice with a CD11c-specific deletion of ADAM10 (ADAM10ΔCD11c) exhibited a complete loss of splenic ESAMhi cDC2A because ADAM10 regulated the commitment, differentiation, and survival of these cells. The major pathways controlled by ADAM10 in ESAMhi cDC2A are Notch, signaling pathways involved in cell proliferation and survival (e.g., mTOR, PI3K/AKT, and EIF2 signaling), and EBI2-mediated localization within the MZ. In addition, we discovered that ADAM10 is a molecular switch regulating cDC2 subset heterogeneity in the spleen, as the disappearance of ESAMhi cDC2A in ADAM10ΔCD11c mice was compensated for by the emergence of a Clec12a+ cDC2B subset closely resembling cDC2 generally found in peripheral lymph nodes. Moreover, in ADAM10ΔCD11c mice, terminal differentiation of cDC1 was abrogated, resulting in severely reduced splenic Langerin+ cDC1 numbers. Next to the disturbed splenic cDC compartment, ADAM10 deficiency on CD11c+ cells led to an increase in marginal metallophilic macrophage (MMM) numbers. In conclusion, our data identify ADAM10 as a molecular hub on both cDC and MMM regulating their transcriptional programming, turnover, homeostasis, and ability to shape the anatomical niche of the MZ.


Assuntos
Proteína ADAM10/metabolismo , Secretases da Proteína Precursora do Amiloide/metabolismo , Células Dendríticas/metabolismo , Proteínas de Membrana/metabolismo , Proteína ADAM10/fisiologia , Secretases da Proteína Precursora do Amiloide/fisiologia , Animais , Células Apresentadoras de Antígenos/metabolismo , Antígeno CD11c/metabolismo , Diferenciação Celular , Proliferação de Células , Feminino , Homeostase , Tecido Linfoide/metabolismo , Macrófagos/metabolismo , Masculino , Proteínas de Membrana/fisiologia , Camundongos , Camundongos Endogâmicos C57BL , Células Mieloides/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Processamento de Proteína Pós-Traducional/genética , Processamento de Proteína Pós-Traducional/fisiologia , Transdução de Sinais , Baço/citologia , Baço/metabolismo
2.
Curr Issues Mol Biol ; 45(12): 9904-9916, 2023 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-38132464

RESUMO

Lipids are important modifiers of protein function, particularly as parts of lipoproteins, which transport lipophilic substances and mediate cellular uptake of circulating lipids. As such, lipids are of particular interest as blood biological markers for cardiovascular disease (CVD) as well as for conditions linked to CVD such as atherosclerosis, diabetes mellitus, obesity and dietary states. Notably, lipid research is particularly well developed in the context of CVD because of the relevance and multiple causes and risk factors of CVD. The advent of methods for high-throughput screening of biological molecules has recently resulted in the generation of lipidomic profiles that allow monitoring of lipid compositions in biological samples in an untargeted manner. These and other earlier advances in biomedical research have shaped the knowledge we have about lipids in CVD. To evaluate the knowledge acquired on the multiple biological functions of lipids in CVD and the trends in their research, we collected a dataset of references from the PubMed database of biomedical literature focused on plasma lipids and CVD in human and mouse. Using annotations from these records, we were able to categorize significant associations between lipids and particular types of research approaches, distinguish non-biological lipids used as markers, identify differential research between human and mouse models, and detect the increasingly mechanistic nature of the results in this field. Using known associations between lipids and proteins that metabolize or transport them, we constructed a comprehensive lipid-protein network, which we used to highlight proteins strongly connected to lipids found in the CVD-lipid literature. Our approach points to a series of proteins for which lipid-focused research would bring insights into CVD, including Prostaglandin G/H synthase 2 (PTGS2, a.k.a. COX2) and Acylglycerol kinase (AGK). In this review, we summarize our findings, putting them in a historical perspective of the evolution of lipid research in CVD.

3.
BMC Bioinformatics ; 23(Suppl 6): 279, 2022 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-35836114

RESUMO

BACKGROUND: The constant evolving and development of next-generation sequencing techniques lead to high throughput data composed of datasets that include a large number of biological samples. Although a large number of samples are usually experimentally processed by batches, scientific publications are often elusive about this information, which can greatly impact the quality of the samples and confound further statistical analyzes. Because dedicated bioinformatics methods developed to detect unwanted sources of variance in the data can wrongly detect real biological signals, such methods could benefit from using a quality-aware approach. RESULTS: We recently developed statistical guidelines and a machine learning tool to automatically evaluate the quality of a next-generation-sequencing sample. We leveraged this quality assessment to detect and correct batch effects in 12 publicly available RNA-seq datasets with available batch information. We were able to distinguish batches by our quality score and used it to correct for some batch effects in sample clustering. Overall, the correction was evaluated as comparable to or better than the reference method that uses a priori knowledge of the batches (in 10 and 1 datasets of 12, respectively; total = 92%). When coupled to outlier removal, the correction was more often evaluated as better than the reference (comparable or better in 5 and 6 datasets of 12, respectively; total = 92%). CONCLUSIONS: In this work, we show the capabilities of our software to detect batches in public RNA-seq datasets from differences in the predicted quality of their samples. We also use these insights to correct the batch effect and observe the relation of sample quality and batch effect. These observations reinforce our expectation that while batch effects do correlate with differences in quality, batch effects also arise from other artifacts and are more suitably  corrected statistically in well-designed experiments.


Assuntos
Algoritmos , Software , Análise por Conglomerados , Aprendizado de Máquina , RNA-Seq
4.
Bioinformatics ; 37(21): 3981-3982, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34358314

RESUMO

SUMMARY: Lipids exhibit an essential role in cellular assembly and signaling. Dysregulation of these functions has been linked with many complications including obesity, diabetes, metabolic disorders, cancer and more. Investigating lipid profiles in such conditions can provide insights into cellular functions and possible interventions. Hence the field of lipidomics is expanding in recent years. Even though the role of individual lipids in diseases has been investigated, there is no resource to perform disease enrichment analysis considering the cumulative association of a lipid set. To address this, we have implemented the LipiDisease web server. The tool analyzes millions of records from the PubMed biomedical literature database discussing lipids and diseases, predicts their association and ranks them according to false discovery rates generated by random simulations. The tool takes into account 4270 diseases and 4798 lipids. Since the tool extracts the information from PubMed records, the number of diseases and lipids will be expanded over time as the biomedical literature grows. AVAILABILITY AND IMPLEMENTATION: The LipiDisease webserver can be freely accessed at http://cbdm-01.zdv.uni-mainz.de:3838/piyusmor/LipiDisease/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Lipídeos , Software , PubMed , Bases de Dados Factuais , Lipídeos/análise , Mineração de Dados
5.
Nucleic Acids Res ; 48(9): e53, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-32187374

RESUMO

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate ChIP-seq variability, we developed a reproducibility score and a method that identifies cell-specific variable regions in ChIP-seq data by integrating replicated ChIP-seq experiments for multiple protein targets on a particular cell type. Using our method, we found variable regions in human cell lines K562, GM12878, HepG2, MCF-7 and in mouse embryonic stem cells (mESCs). These variable-occupancy target regions (VOTs) are CG dinucleotide rich, and show enrichment at promoters and R-loops. They overlap significantly with HOT regions, but are not blacklisted regions producing non-specific binding ChIP-seq peaks. Furthermore, in mESCs, VOTs are conserved among placental species suggesting that they could have a function important for this taxon. Our method can be useful to point to such regions along the genome in a given cell type of interest, to improve the downstream interpretative analysis before follow-up experiments.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Linhagem Celular , Linhagem Celular Tumoral , Cromatina/metabolismo , Células-Tronco Embrionárias/metabolismo , Evolução Molecular , Variação Genética , Genômica/métodos , Humanos , Células K562 , Células MCF-7 , Camundongos , Nucleotídeos/análise , Análise de Componente Principal , Regiões Promotoras Genéticas , Estruturas R-Loop
6.
Genes Dev ; 27(17): 1932-46, 2013 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-24013505

RESUMO

Understanding how distinct cell types arise from multipotent progenitor cells is a major quest in stem cell biology. The liver and pancreas share many aspects of their early development and possibly originate from a common progenitor. However, how liver and pancreas cells diverge from a common endoderm progenitor population and adopt specific fates remains elusive. Using RNA sequencing (RNA-seq), we defined the molecular identity of liver and pancreas progenitors that were isolated from the mouse embryo at two time points, spanning the period when the lineage decision is made. The integration of temporal and spatial gene expression profiles unveiled mutually exclusive signaling signatures in hepatic and pancreatic progenitors. Importantly, we identified the noncanonical Wnt pathway as a potential developmental regulator of this fate decision and capable of inducing the pancreas program in endoderm and liver cells. Our study offers an unprecedented view of gene expression programs in liver and pancreas progenitors and forms the basis for formulating lineage-reprogramming strategies to convert adult hepatic cells into pancreatic cells.


Assuntos
Diferenciação Celular , Regulação da Expressão Gênica no Desenvolvimento , Fígado , Pâncreas , Transdução de Sinais , Células-Tronco/citologia , Animais , Linhagem Celular , Linhagem da Célula , Endoderma/citologia , Perfilação da Expressão Gênica , Fígado/citologia , Fígado/embriologia , Camundongos , Pâncreas/citologia , Pâncreas/embriologia , Análise de Sequência de RNA , Fatores de Tempo , Proteínas Wnt/genética , Proteínas Wnt/metabolismo , Xenopus/embriologia
7.
Methods ; 132: 57-65, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28716510

RESUMO

Toxicity affecting humans is studied by observing the effects of chemical substances in animal organisms (in vivo) or in animal and human cultivated cell lines (in vitro). Toxicogenomics studies collect gene expression profiles and histopathology assessment data for hundreds of drugs and pollutants in standardized experimental designs using different model systems. These data are an invaluable source for analyzing genome-wide drug response in biological systems. However, a problem remains that is how to evaluate the suitability of heterogeneous in vitro and in vivo systems to model the many different aspects of human toxicity. We propose here that a given model system (cell type or animal organ) is supported to appropriately describe a particular aspect of human toxicity if the set of compounds associated in the literature with that aspect of toxicity causes a change in expression of genes with a particular function in the tested model system. This approach provides candidate genes to explain the toxicity effect (the differentially expressed genes) and the compounds whose effect could be modeled (the ones producing both the change of expression in the model system and that are associated with the human phenotype in the literature). Here we present an application of this approach using a computational pipeline that integrates compound-induced gene expression profiles (from the Open TG-GATEs database) and biomedical literature annotations (from the PubMed database) to evaluate the suitability of (human and rat) in vitro systems as well as rat in vivo systems to model human toxicity.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Animais , Células Cultivadas , Hepatócitos/efeitos dos fármacos , Hepatócitos/fisiologia , Humanos , Ratos , Toxicogenética , Transcriptoma
8.
Methods ; 74: 90-6, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25484337

RESUMO

Clinical evaluation of patients and diagnosis of disorder is crucial to make decisions on appropriate therapies. In addition, in the case of genetic disorders resulting from gene abnormalities, phenotypic effects may guide basic research on the mechanisms of a disorder to find the mutated gene and therefore to propose novel targets for drug therapy. However, this approach is complicated by two facts. First, the relationship between genes and disorders is not simple: one gene may be related to multiple disorders and a disorder may be caused by mutations in different genes. Second, recognizing relevant phenotypes might be difficult for clinicians working with patients of closely related complex disorders. Neuropsychiatric disorders best illustrate these difficulties since phenotypes range from metabolic to behavioral aspects, the latter extremely complex. Based on our clinical expertise on five neurodegenerative disorders, and from the wealth of bibliographical data on neuropsychiatric disorders, we have built a resource to infer associations between genes, chemicals, phenotypes for a total of 31 disorders. An initial step of automated text mining of the literature related to 31 disorders returned thousands of enriched terms. Fewer relevant phenotypic terms were manually selected by clinicians as relevant to the five neural disorders of their expertise and used to analyze the complete set of disorders. Analysis of the data indicates general relationships between neuropsychiatric disorders, which can be used to classify and characterize them. Correlation analyses allowed us to propose novel associations of genes and drugs with disorders. More generally, the results led us to uncovering mechanisms of disease that span multiple neuropsychiatric disorders, for example that genes related to synaptic transmission and receptor functions tend to be involved in many disorders, whereas genes related to sensory perception and channel transport functions are associated with fewer disorders. Our study shows that starting from expertise covering a limited set of neurological disorders and using text and data mining methods, meaningful and novel associations regarding genes, chemicals and phenotypes can be derived for an expanded set of neuropsychiatric disorders. Our results are intended for clinicians to help them evaluate patients, and for basic scientists to propose new gene targets for drug therapies. This strategy can be extended to virtually all diseases and takes advantage of the ever increasing amount of biomedical literature.


Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas , Redes Reguladoras de Genes/genética , Transtornos Mentais/genética , Fenótipo , Bases de Dados Genéticas/normas , Humanos
9.
Nucleic Acids Res ; 42(Database issue): D950-8, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24304896

RESUMO

CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical literature. To date, CellFinder describes 3394 cell types and 50 951 cell lines. The database currently contains 3055 microscopic and anatomical images, 205 whole-genome expression profiles of 194 cell/tissue types from RNA-seq and microarrays and 553 905 protein expressions for 535 cells/tissues. Text mining of a corpus of >2000 publications followed by manual curation confirmed expression information on ∼900 proteins and genes. CellFinder's data model is capable to seamlessly represent entities from single cells to the organ level, to incorporate mappings between homologous entities in different species and to describe processes of cell development and differentiation. Its ontological backbone currently consists of 204 741 ontology terms incorporated from 10 different ontologies unified under the novel CELDA ontology. CellFinder's web portal allows searching, browsing and comparing the stored data, interactive construction of developmental trees and navigating the partonomic hierarchy of cells and tissues through a unique body browser designed for life scientists and clinicians.


Assuntos
Células/metabolismo , Bases de Dados Factuais , Animais , Linhagem Celular , Fenômenos Fisiológicos Celulares , Células/citologia , Estruturas Celulares/ultraestrutura , Mineração de Dados , Perfilação da Expressão Gênica , Humanos , Internet , Rim/citologia , Fígado/citologia , Proteínas/metabolismo , RNA/metabolismo
10.
Nucleic Acids Res ; 41(3): 1496-507, 2013 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-23275563

RESUMO

The yeast two-hybrid (Y2H) system is the most widely applied methodology for systematic protein-protein interaction (PPI) screening and the generation of comprehensive interaction networks. We developed a novel Y2H interaction screening procedure using DNA microarrays for high-throughput quantitative PPI detection. Applying a global pooling and selection scheme to a large collection of human open reading frames, proof-of-principle Y2H interaction screens were performed for the human neurodegenerative disease proteins huntingtin and ataxin-1. Using systematic controls for unspecific Y2H results and quantitative benchmarking, we identified and scored a large number of known and novel partner proteins for both huntingtin and ataxin-1. Moreover, we show that this parallelized screening procedure and the global inspection of Y2H interaction data are uniquely suited to define specific PPI patterns and their alteration by disease-causing mutations in huntingtin and ataxin-1. This approach takes advantage of the specificity and flexibility of DNA microarrays and of the existence of solid-related statistical methods for the analysis of DNA microarray data, and allows a quantitative approach toward interaction screens in human and in model organisms.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Técnicas do Sistema de Duplo-Híbrido , Ataxina-1 , Ataxinas , Humanos , Proteína Huntingtina , Mutação , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Fases de Leitura Aberta , Mapas de Interação de Proteínas , Leveduras/genética
11.
PLoS Comput Biol ; 9(1): e1002860, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23300433

RESUMO

Interactions of proteins regulate signaling, catalysis, gene expression and many other cellular functions. Therefore, characterizing the entire human interactome is a key effort in current proteomics research. This challenge is complicated by the dynamic nature of protein-protein interactions (PPIs), which are conditional on the cellular context: both interacting proteins must be expressed in the same cell and localized in the same organelle to meet. Additionally, interactions underlie a delicate control of signaling pathways, e.g. by post-translational modifications of the protein partners - hence, many diseases are caused by the perturbation of these mechanisms. Despite the high degree of cell-state specificity of PPIs, many interactions are measured under artificial conditions (e.g. yeast cells are transfected with human genes in yeast two-hybrid assays) or even if detected in a physiological context, this information is missing from the common PPI databases. To overcome these problems, we developed a method that assigns context information to PPIs inferred from various attributes of the interacting proteins: gene expression, functional and disease annotations, and inferred pathways. We demonstrate that context consistency correlates with the experimental reliability of PPIs, which allows us to generate high-confidence tissue- and function-specific subnetworks. We illustrate how these context-filtered networks are enriched in bona fide pathways and disease proteins to prove the ability of context-filters to highlight meaningful interactions with respect to various biological questions. We use this approach to study the lung-specific pathways used by the influenza virus, pointing to IRAK1, BHLHE40 and TOLLIP as potential regulators of influenza virus pathogenicity, and to study the signalling pathways that play a role in Alzheimer's disease, identifying a pathway involving the altered phosphorylation of the Tau protein. Finally, we provide the annotated human PPI network via a web frontend that allows the construction of context-specific networks in several ways.


Assuntos
Proteínas/metabolismo , Doença de Alzheimer/metabolismo , Biocatálise , Humanos , Fosforilação , Ligação Proteica , Proteoma , Transdução de Sinais , Proteínas Virais/metabolismo
12.
BMC Bioinformatics ; 14: 113, 2013 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-23537461

RESUMO

BACKGROUND: A popular query from scientists reading a biomedical abstract is to search for topic-related documents in bibliographic databases. Such a query is challenging because the amount of information attached to a single abstract is little, whereas classification-based retrieval algorithms are optimally trained with large sets of relevant documents. As a solution to this problem, we propose a query expansion method that extends the information related to a manuscript using its cited references. RESULTS: Data on cited references and text sections in 249,108 full-text biomedical articles was extracted from the Open Access subset of the PubMed Central® database (PMC-OA). Of the five standard sections of a scientific article, the Introduction and Discussion sections contained most of the citations (mean = 10.2 and 9.9 citations, respectively). A large proportion of articles (98.4%) and their cited references (79.5%) were indexed in the PubMed® database. Using the MedlineRanker abstract classification tool, cited references allowed accurate retrieval of the citing document in a test set of 10,000 documents and also of documents related to six biomedical topics defined by particular MeSH® terms from the entire PMC-OA (p-value<0.01). Classification performance was sensitive to the topic and also to the text sections from which the references were selected. Classifiers trained on the baseline (i.e., only text from the query document and not from the references) were outperformed in almost all the cases. Best performance was often obtained when using all cited references, though using the references from Introduction and Discussion sections led to similarly good results. This query expansion method performed significantly better than pseudo relevance feedback in 4 out of 6 topics. CONCLUSIONS: The retrieval of documents related to a single document can be significantly improved by using the references cited by this document (p-value<0.01). Using references from Introduction and Discussion performs almost as well as using all references, which might be useful for methods that require reduced datasets due to computational limitations. Cited references from particular sections might not be appropriate for all topics. Our method could be a better alternative to pseudo relevance feedback though it is limited by full text availability.


Assuntos
Mineração de Dados/métodos , PubMed , Algoritmos , Medical Subject Headings
13.
BMC Bioinformatics ; 14: 228, 2013 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-23865855

RESUMO

BACKGROUND: The need for detailed description and modeling of cells drives the continuous generation of large and diverse datasets. Unfortunately, there exists no systematic and comprehensive way to organize these datasets and their information. CELDA (Cell: Expression, Localization, Development, Anatomy) is a novel ontology for the association of primary experimental data and derived knowledge to various types of cells of organisms. RESULTS: CELDA is a structure that can help to categorize cell types based on species, anatomical localization, subcellular structures, developmental stages and origin. It targets cells in vitro as well as in vivo. Instead of developing a novel ontology from scratch, we carefully designed CELDA in such a way that existing ontologies were integrated as much as possible, and only minimal extensions were performed to cover those classes and areas not present in any existing model. Currently, ten existing ontologies and models are linked to CELDA through the top-level ontology BioTop. Together with 15.439 newly created classes, CELDA contains more than 196.000 classes and 233.670 relationship axioms. CELDA is primarily used as a representational framework for modeling, analyzing and comparing cells within and across species in CellFinder, a web based data repository on cells (http://cellfinder.org). CONCLUSIONS: CELDA can semantically link diverse types of information about cell types. It has been integrated within the research platform CellFinder, where it exemplarily relates cell types from liver and kidney during development on the one hand and anatomical locations in humans on the other, integrating information on all spatial and temporal stages. CELDA is available from the CellFinder website: http://cellfinder.org/about/ontology.


Assuntos
Células/classificação , Vocabulário Controlado , Células/metabolismo , Estruturas Celulares , Células-Tronco Embrionárias , Expressão Gênica , Humanos , Rim/citologia
14.
Nucleic Acids Res ; 39(Web Server issue): W455-61, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21609954

RESUMO

UNLABELLED: Biomedical literature is traditionally used as a way to inform scientists of the relevance of genes in relation to a research topic. However many genes, especially from poorly studied organisms, are not discussed in the literature. Moreover, a manual and comprehensive summarization of the literature attached to the genes of an organism is in general impossible due to the high number of genes and abstracts involved. We introduce the novel Génie algorithm that overcomes these problems by evaluating the literature attached to all genes in a genome and to their orthologs according to a selected topic. Génie showed high precision (up to 100%) and the best performance in comparison to other algorithms in most of the benchmarks, especially when high sensitivity was required. Moreover, the prioritization of zebrafish genes involved in heart development, using human and mouse orthologs, showed high enrichment in differentially expressed genes from microarray experiments. The Génie web server supports hundreds of species, millions of genes and offers novel functionalities. Common run times below a minute, even when analyzing the human genome with hundreds of thousands of literature records, allows the use of Génie in routine lab work. AVAILABILITY: http://cbdm.mdc-berlin.de/tools/genie/.


Assuntos
Genes , Software , Algoritmos , Animais , Perfilação da Expressão Gênica , Genômica , Coração/embriologia , Humanos , Internet , MEDLINE , Camundongos , Modelos Animais , Peixe-Zebra/embriologia , Peixe-Zebra/genética
15.
Bioinformatics ; 27(17): 2414-21, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21798963

RESUMO

MOTIVATION: Protein-protein interaction (PPI) databases are widely used tools to study cellular pathways and networks; however, there are several databases available that still do not account for cell type-specific differences. Here, we evaluated the characteristics of six interaction databases, incorporated tissue-specific gene expression information and finally, investigated if the most popular proteins of scientific literature are involved in good quality interactions. RESULTS: We found that the evaluated databases are comparable in terms of node connectivity (i.e. proteins with few interaction partners also have few interaction partners in other databases), but may differ in the identity of interaction partners. We also observed that the incorporation of tissue-specific expression information significantly altered the interaction landscape and finally, we demonstrated that many of the most intensively studied proteins are engaged in interactions associated with low confidence scores. In summary, interaction databases are valuable research tools but may lead to different predictions on interactions or pathways. The accuracy of predictions can be improved by incorporating datasets on organ- and cell type-specific gene expression, and by obtaining additional interaction evidence for the most 'popular' proteins. CONTACT: kitano@sbi.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Expressão Gênica , Humanos , Proteínas/genética , Proteínas/metabolismo
16.
PLoS One ; 17(7): e0270043, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35776722

RESUMO

MOTIVATION: Single-cell Chromatin ImmunoPrecipitation DNA-Sequencing (scChIP-seq) analysis is challenging due to data sparsity. High degree of sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from the ENCODE project to impute missing protein-DNA interacting regions of target histone marks or transcription factors. RESULTS: Imputations using machine learning models trained for each single cell, each ChIP protein target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene identification on real human data. Results on bulk data simulating single cells show that the imputations are single-cell specific as the imputed profiles are closer to the simulated cell than to other cells related to the same ChIP protein target and the same cell type. Simulations also show that 100 input genomic regions are already enough to train single-cell specific models for the imputation of thousands of undetected regions. Furthermore, SIMPA enables the interpretation of machine learning models by revealing interaction sites of a given single cell that are most important for the imputation model trained for a specific genomic region. The corresponding feature importance values derived from promoter-interaction profiles of H3K4me3, an activating histone mark, highly correlate with co-expression of genes that are present within the cell-type specific pathways in 2 real human and mouse datasets. The SIMPA's interpretable imputation method allows users to gain a deep understanding of individual cells and, consequently, of sparse scChIP-seq datasets. AVAILABILITY AND IMPLEMENTATION: Our interpretable imputation algorithm was implemented in Python and is available at https://github.com/salbrec/SIMPA.


Assuntos
Genômica , Aprendizado de Máquina , Animais , Análise por Conglomerados , DNA , Camundongos , Análise de Sequência de DNA/métodos
17.
Genes (Basel) ; 13(5)2022 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-35627304

RESUMO

The gene family of insect olfactory receptors (ORs) has expanded greatly over the course of evolution. ORs enable insects to detect volatile chemicals and therefore play an important role in social interactions, enemy and prey recognition, and foraging. The sequences of several thousand ORs are known, but their specific function or their ligands have only been identified for very few of them. To advance the functional characterization of ORs, we have assembled, curated, and aligned the sequences of 3902 ORs from 21 insect species, which we provide as an annotated online resource. Using functionally characterized proteins from the fly Drosophila melanogaster, the mosquito Anopheles gambiae and the ant Harpegnathos saltator, we identified amino acid positions that best predict response to ligands. We examined the conservation of these predicted relevant residues in all OR subfamilies; the results showed that the subfamilies that expanded strongly in social insects had a high degree of conservation in their binding sites. This suggests that the ORs of social insect families are typically finely tuned and exhibit sensitivity to very similar odorants. Our novel approach provides a powerful tool to exploit functional information from a limited number of genes to study the functional evolution of large gene families.


Assuntos
Receptores Odorantes , Animais , Drosophila melanogaster/metabolismo , Proteínas de Insetos/metabolismo , Insetos/genética , Insetos/metabolismo , Ligantes , Receptores Odorantes/genética , Receptores Odorantes/metabolismo
18.
BMC Bioinformatics ; 12: 435, 2011 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-22070195

RESUMO

BACKGROUND: Biological function is greatly dependent on the interactions of proteins with other proteins and genes. Abstracts from the biomedical literature stored in the NCBI's PubMed database can be used for the derivation of interactions between genes and proteins by identifying the co-occurrences of their terms. Often, the amount of interactions obtained through such an approach is large and may mix processes occurring in different contexts. Current tools do not allow studying these data with a focus on concepts of relevance to a user, for example, interactions related to a disease or to a biological mechanism such as protein aggregation. RESULTS: To help the concept-oriented exploration of such data we developed PESCADOR, a web tool that extracts a network of interactions from a set of PubMed abstracts given by a user, and allows filtering the interaction network according to user-defined concepts. We illustrate its use in exploring protein aggregation in neurodegenerative disease and in the expansion of pathways associated to colon cancer. CONCLUSIONS: PESCADOR is a platform independent web resource available at: http://cbdm.mdc-berlin.de/tools/pescador/


Assuntos
Mineração de Dados , PubMed , Software , Neoplasias Colorretais/genética , Neoplasias Colorretais/metabolismo , Humanos , Internet , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/metabolismo , Proteínas/genética , Proteínas/metabolismo
19.
BMC Bioinformatics ; 12 Suppl 8: S3, 2011 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-22151929

RESUMO

BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. RESULTS: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%. CONCLUSIONS: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.


Assuntos
Algoritmos , Mineração de Dados , Proteínas/metabolismo , Animais , Bases de Dados de Proteínas , Humanos , Publicações Periódicas como Assunto , PubMed
20.
Nucleic Acids Res ; 37(Web Server issue): W141-6, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19429696

RESUMO

The biomedical literature is represented by millions of abstracts available in the Medline database. These abstracts can be queried with the PubMed interface, which provides a keyword-based Boolean search engine. This approach shows limitations in the retrieval of abstracts related to very specific topics, as it is difficult for a non-expert user to find all of the most relevant keywords related to a biomedical topic. Additionally, when searching for more general topics, the same approach may return hundreds of unranked references. To address these issues, text mining tools have been developed to help scientists focus on relevant abstracts. We have implemented the MedlineRanker webserver, which allows a flexible ranking of Medline for a topic of interest without expert knowledge. Given some abstracts related to a topic, the program deduces automatically the most discriminative words in comparison to a random selection. These words are used to score other abstracts, including those from not yet annotated recent publications, which can be then ranked by relevance. We show that our tool can be highly accurate and that it is able to process millions of abstracts in a practical amount of time. MedlineRanker is free for use and is available at http://cbdm.mdc-berlin.de/tools/medlineranker.


Assuntos
Armazenamento e Recuperação da Informação/métodos , MEDLINE , Software , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA