Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 40(1)2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38147362

RESUMEN

MOTIVATION: Up-to-date pathway knowledge is usually presented in scientific publications for human reading, making it difficult to utilize these resources for semantic integration and computational analysis of biological pathways. We here present an approach to mining knowledge graphs by combining manual curation with automated named entity recognition and automated relation extraction. This approach allows us to study pathway-related questions in detail, which we here show using the ketamine pathway, aiming to help improve understanding of the role of gut microbiota in the antidepressant effects of ketamine. RESULTS: The thus devised ketamine pathway 'KetPath' knowledge graph comprises five parts: (i) manually curated pathway facts from images; (ii) recognized named entities in biomedical texts; (iii) identified relations between named entities; (iv) our previously constructed microbiota and pre-/probiotics knowledge bases; and (v) multiple community-accepted public databases. We first assessed the performance of automated extraction of relations between named entities using the specially designed state-of-the-art tool BioKetBERT. The query results show that we can retrieve drug actions, pathway relations, co-occurring entities, and their relations. These results uncover several biological findings, such as various gut microbes leading to increased expression of BDNF, which may contribute to the sustained antidepressant effects of ketamine. We envision that the methods and findings from this research will aid researchers who wish to integrate and query data and knowledge from multiple biomedical databases and literature simultaneously. AVAILABILITY AND IMPLEMENTATION: Data and query protocols are available in the KetPath repository at https://dx.doi.org/10.5281/zenodo.8398941 and https://github.com/tingcosmos/KetPath.


Asunto(s)
Microbioma Gastrointestinal , Ketamina , Humanos , Ketamina/farmacología , Bases de Datos Factuales , Antidepresivos/farmacología , Neurotransmisores , Minería de Datos/métodos
2.
Bioinformatics ; 38(8): 2111-2118, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35150231

RESUMEN

MOTIVATION: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS: We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Proteínas , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Nucleótidos , Biología Computacional/métodos
3.
PLoS Comput Biol ; 18(12): e1010669, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36454728

RESUMEN

The ubiquitous availability of genome sequencing data explains the popularity of machine learning-based methods for the prediction of protein properties from their amino acid sequences. Over the years, while revising our own work, reading submitted manuscripts as well as published papers, we have noticed several recurring issues, which make some reported findings hard to understand and replicate. We suspect this may be due to biologists being unfamiliar with machine learning methodology, or conversely, machine learning experts may miss some of the knowledge needed to correctly apply their methods to proteins. Here, we aim to bridge this gap for developers of such methods. The most striking issues are linked to a lack of clarity: how were annotations of interest obtained; which benchmark metrics were used; how are positives and negatives defined. Others relate to a lack of rigor: If you sneak in structural information, your method is not sequence-based; if you compare your own model to "state-of-the-art," take the best methods; if you want to conclude that some method is better than another, obtain a significance estimate to support this claim. These, and other issues, we will cover in detail. These points may have seemed obvious to the authors during writing; however, they are not always clear-cut to the readers. We also expect many of these tips to hold for other machine learning-based applications in biology. Therefore, many computational biologists who develop methods in this particular subject will benefit from a concise overview of what to avoid and what to do instead.


Asunto(s)
Benchmarking , Aprendizaje Automático , Secuencia de Aminoácidos , Mapeo Cromosómico , Conocimiento
4.
Bioinformatics ; 37(20): 3421-3427, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-33974039

RESUMEN

MOTIVATION: Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen's epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. RESULTS: We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.
Bioinformatics ; 36(7): 2142-2149, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31845959

RESUMEN

MOTIVATION: Genetic interaction (GI) patterns are characterized by the phenotypes of interacting single and double mutated gene pairs. Uncovering the regulatory mechanisms of GIs would provide a better understanding of their role in biological processes, diseases and drug response. Computational analyses can provide insights into the underpinning mechanisms of GIs. RESULTS: In this study, we present a framework for exhaustive modelling of GI patterns using Petri nets (PN). Four-node models were defined and generated on three levels with restrictions, to enable an exhaustive approach. Simulations suggest ∼5 million models of GIs. Generalizing these we propose putative mechanisms for the GI patterns, inversion and suppression. We demonstrate that exhaustive PN modelling enables reasoning about mechanisms of GIs when only the phenotypes of gene pairs are known. The framework can be applied to other GI or genetic regulatory datasets. AVAILABILITY AND IMPLEMENTATION: The framework is available at http://www.ibi.vu.nl/programs/ExhMod. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Bioinformatics ; 35(24): 5315-5317, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31368486

RESUMEN

SUMMARY: PRALINE 2 is a toolkit for custom multiple sequence alignment workflows. It can be used to incorporate sequence annotations, such as secondary structure or (DNA) motifs, into the alignment scoring, as well as to customize many other aspects of a progressive multiple alignment workflow. AVAILABILITY AND IMPLEMENTATION: PRALINE 2 is implemented in Python and available as open source software on GitHub: https://github.com/ibivu/PRALINE/.


Asunto(s)
Programas Informáticos , ADN , Estructura Secundaria de Proteína , Alineación de Secuencia
7.
Bioinformatics ; 35(22): 4794-4796, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31116381

RESUMEN

MOTIVATION: Interpretation of ubiquitous protein sequence data has become a bottleneck in biomolecular research, due to a lack of structural and other experimental annotation data for these proteins. Prediction of protein interaction sites from sequence may be a viable substitute. We therefore recently developed a sequence-based random forest method for protein-protein interface prediction, which yielded a significantly increased performance than other methods on both homomeric and heteromeric protein-protein interactions. Here, we present a webserver that implements this method efficiently. RESULTS: With the aim of accelerating our previous approach, we obtained sequence conservation profiles by re-mastering the alignment of homologous sequences found by PSI-BLAST. This yielded a more than 10-fold speedup and at least the same accuracy, as reported previously for our method; these results allowed us to offer the method as a webserver. The web-server interface is targeted to the non-expert user. The input is simply a sequence of the protein of interest, and the output a table with scores indicating the likelihood of having an interaction interface at a certain position. As the method is sequence-based and not sensitive to the type of protein interaction, we expect this webserver to be of interest to many biological researchers in academia and in industry. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets are available at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Proteínas , Análisis de Secuencia de Proteína
8.
PLoS Comput Biol ; 15(5): e1007061, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-31083661

RESUMEN

Genetic interactions, a phenomenon whereby combinations of mutations lead to unexpected effects, reflect how cellular processes are wired and play an important role in complex genetic diseases. Understanding the molecular basis of genetic interactions is crucial for deciphering pathway organization as well as understanding the relationship between genetic variation and disease. Several hypothetical molecular mechanisms have been linked to different genetic interaction types. However, differences in genetic interaction patterns and their underlying mechanisms have not yet been compared systematically between different functional gene classes. Here, differences in the occurrence and types of genetic interactions are compared for two classes, gene-specific transcription factors (GSTFs) and signaling genes (kinases and phosphatases). Genome-wide gene expression data for 63 single and double deletion mutants in baker's yeast reveals that the two most common genetic interaction patterns are buffering and inversion. Buffering is typically associated with redundancy and is well understood. In inversion, genes show opposite behavior in the double mutant compared to the corresponding single mutants. The underlying mechanism is poorly understood. Although both classes show buffering and inversion patterns, the prevalence of inversion is much stronger in GSTFs. To decipher potential mechanisms, a Petri Net modeling approach was employed, where genes are represented as nodes and relationships between genes as edges. This allowed over 9 million possible three and four node models to be exhaustively enumerated. The models show that a quantitative difference in interaction strength is a strict requirement for obtaining inversion. In addition, this difference is frequently accompanied with a second gene that shows buffering. Taken together, these results provide a mechanistic explanation for inversion. Furthermore, the ability of transcription factors to differentially regulate expression of their targets provides a likely explanation why inversion is more prevalent for GSTFs compared to kinases and phosphatases.


Asunto(s)
Regulación de la Expresión Génica , Modelos Genéticos , Factores de Transcripción/metabolismo , Inversión Cromosómica , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas , Epistasis Genética , Genes Fúngicos , Estudios de Asociación Genética , Mutación , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crecimiento & desarrollo , Saccharomyces cerevisiae/metabolismo , Transducción de Señal/genética
9.
PLoS Comput Biol ; 14(11): e1006547, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30383764

RESUMEN

Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.


Asunto(s)
Secuencias de Aminoácidos , ADN/química , Proteínas/química , Alineación de Secuencia/normas , Algoritmos , Secuencia de Aminoácidos , Secuencia Conservada , VIH-1/química , Homología de Secuencia de Aminoácido , Productos del Gen env del Virus de la Inmunodeficiencia Humana/química
10.
Bioinformatics ; 33(10): 1479-1487, 2017 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-28073761

RESUMEN

MOTIVATION: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in protein-protein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction. In this paper, we evaluate the importance of various features using Random Forest (RF), and include as a novel feature backbone flexibility predicted from sequences to further optimise protein interface prediction. RESULTS: We observe that there is no single sequence feature that enables pinpointing interacting sites in our Random Forest models. However, combining different properties does increase the performance of interface prediction. Our homomeric-trained RF interface predictor is able to distinguish interface from non-interface residues with an area under the ROC curve of 0.72 in a homomeric test-set. The heteromeric-trained RF interface predictor performs better than existing predictors on a independent heteromeric test-set. We trained a more general predictor on the combined homomeric and heteromeric dataset, and show that in addition to predicting homomeric interfaces, it is also able to pinpoint interface residues in heterodimers. This suggests that our random forest model and the features included capture common properties of both homodimer and heterodimer interfaces. AVAILABILITY AND IMPLEMENTATION: The predictors and test datasets used in our analyses are freely available ( http://www.ibi.vu.nl/downloads/RF_PPI/ ). CONTACT: k.a.feenstra@vu.nl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Modelos Estadísticos , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Multimerización de Proteína , Biología Computacional/métodos , Curva ROC , Análisis de Secuencia de Proteína/métodos
11.
Bioinformatics ; 32(12): i60-i69, 2016 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-27307645

RESUMEN

MOTIVATION: Biological pathways play a key role in most cellular functions. To better understand these functions, diverse computational and cell biology researchers use biological pathway data for various analysis and modeling purposes. For specifying these biological pathways, a community of researchers has defined BioPAX and provided various tools for creating, validating and visualizing BioPAX models. However, a generic software framework for simulating BioPAX models is missing. Here, we attempt to fill this gap by introducing a generic simulation framework for BioPAX. The framework explicitly separates the execution model from the model structure as provided by BioPAX, with the advantage that the modelling process becomes more reproducible and intrinsically more modular; this ensures natural biological constraints are satisfied upon execution. The framework is based on the principles of discrete event systems and multi-agent systems, and is capable of automatically generating a hierarchical multi-agent system for a given BioPAX model. RESULTS: To demonstrate the applicability of the framework, we simulated two types of biological network models: a gene regulatory network modeling the haematopoietic stem cell regulators and a signal transduction network modeling the Wnt/ß-catenin signaling pathway. We observed that the results of the simulations performed using our framework were entirely consistent with the simulation results reported by the researchers who developed the original models in a proprietary language. AVAILABILITY AND IMPLEMENTATION: The framework, implemented in Java, is open source and its source code, documentation and tutorial are available at http://www.ibi.vu.nl/programs/BioASF CONTACT: j.heringa@vu.nl.


Asunto(s)
Redes Reguladoras de Genes , Modelos Biológicos , Transducción de Señal , Programas Informáticos , Simulación por Computador , Humanos , Lenguajes de Programación
12.
Proc Natl Acad Sci U S A ; 110(22): 8894-9, 2013 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-23676274

RESUMEN

Estrogen receptor alpha (ERα) is involved in numerous physiological and pathological processes, including breast cancer. Breast cancer therapy is therefore currently directed at inhibiting the transcriptional potency of ERα, either by blocking estrogen production through aromatase inhibitors or antiestrogens that compete for hormone binding. Due to resistance, new treatment modalities are needed and as ERα dimerization is essential for its activity, interference with receptor dimerization offers a new opportunity to exploit in drug design. Here we describe a unique mechanism of how ERα dimerization is negatively controlled by interaction with 14-3-3 proteins at the extreme C terminus of the receptor. Moreover, the small-molecule fusicoccin (FC) stabilizes this ERα/14-3-3 interaction. Cocrystallization of the trimeric ERα/14-3-3/FC complex provides the structural basis for this stabilization and shows the importance of phosphorylation of the penultimate Threonine (ERα-T(594)) for high-affinity interaction. We confirm that T(594) is a distinct ERα phosphorylation site in the breast cancer cell line MCF-7 using a phospho-T(594)-specific antibody and by mass spectrometry. In line with its ERα/14-3-3 interaction stabilizing effect, fusicoccin reduces the estradiol-stimulated ERα dimerization, inhibits ERα/chromatin interactions and downstream gene expression, resulting in decreased cell proliferation. Herewith, a unique functional phosphosite and an alternative regulation mechanism of ERα are provided, together with a small molecule that selectively targets this ERα/14-3-3 interface.


Asunto(s)
Proteínas 14-3-3/metabolismo , Neoplasias de la Mama/tratamiento farmacológico , Sistemas de Liberación de Medicamentos/métodos , Receptor alfa de Estrógeno/metabolismo , Glicósidos/farmacología , Modelos Moleculares , Conformación Proteica , Secuencia de Aminoácidos , Cristalización , Dimerización , Receptor alfa de Estrógeno/genética , Femenino , Polarización de Fluorescencia , Componentes del Gen , Regulación de la Expresión Génica/efectos de los fármacos , Humanos , Células MCF-7 , Espectrometría de Masas , Datos de Secuencia Molecular , Fosforilación , Isoformas de Proteínas/metabolismo , Alineación de Secuencia
13.
BMC Bioinformatics ; 16: 325, 2015 Oct 08.
Artículo en Inglés | MEDLINE | ID: mdl-26449222

RESUMEN

BACKGROUND: Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. RESULTS: We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9% recall and up to 25.1% precision. CONCLUSIONS: To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites.


Asunto(s)
Proteínas/química , Área Bajo la Curva , Dimerización , Dominios y Motivos de Interacción de Proteínas , Proteínas/metabolismo , Curva ROC
14.
Brief Bioinform ; 14(5): 589-98, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23603092

RESUMEN

Teaching students with very diverse backgrounds can be extremely challenging. This article uses the Bioinformatics and Systems Biology MSc in Amsterdam as a case study to describe how the knowledge gap for students with heterogeneous backgrounds can be bridged. We show that a mix in backgrounds can be turned into an advantage by creating a stimulating learning environment for the students. In the MSc Programme, conversion classes help to bridge differences between students, by mending initial knowledge and skill gaps. Mixing students from different backgrounds in a group to solve a complex task creates an opportunity for the students to reflect on their own abilities. We explain how a truly interdisciplinary approach to teaching helps students of all backgrounds to achieve the MSc end terms. Moreover, transferable skills obtained by the students in such a mixed study environment are invaluable for their later careers.


Asunto(s)
Biología Computacional/educación , Biología de Sistemas/educación , Curriculum , Educación de Postgrado , Humanos , Países Bajos , Estudiantes
15.
Bioinformatics ; 30(3): 326-34, 2014 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-24273239

RESUMEN

MOTIVATION: To assess whether two proteins will interact under physiological conditions, information on the interaction free energy is needed. Statistical learning techniques and docking methods for predicting protein-protein interactions cannot quantitatively estimate binding free energies. Full atomistic molecular simulation methods do have this potential, but are completely unfeasible for large-scale applications in terms of computational cost required. Here we investigate whether applying coarse-grained (CG) molecular dynamics simulations is a viable alternative for complexes of known structure. RESULTS: We calculate the free energy barrier with respect to the bound state based on molecular dynamics simulations using both a full atomistic and a CG force field for the TCR-pMHC complex and the MP1-p14 scaffolding complex. We find that the free energy barriers from the CG simulations are of similar accuracy as those from the full atomistic ones, while achieving a speedup of >500-fold. We also observe that extensive sampling is extremely important to obtain accurate free energy barriers, which is only within reach for the CG models. Finally, we show that the CG model preserves biological relevance of the interactions: (i) we observe a strong correlation between evolutionary likelihood of mutations and the impact on the free energy barrier with respect to the bound state; and (ii) we confirm the dominant role of the interface core in these interactions. Therefore, our results suggest that CG molecular simulations can realistically be used for the accurate prediction of protein-protein interaction strength. AVAILABILITY AND IMPLEMENTATION: The python analysis framework and data files are available for download at http://www.ibi.vu.nl/downloads/bioinformatics-2013-btt675.tgz.


Asunto(s)
Simulación de Dinámica Molecular , Mapeo de Interacción de Proteínas/métodos , Complejos Multiproteicos/química , Complejos Multiproteicos/genética , Mutación , Termodinámica
16.
Bioinformatics ; 29(13): i80-8, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23813012

RESUMEN

MOTIVATION: Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements, therefore, has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here, we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes. RESULTS: Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as 'stepping stones' for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or 'trigger' is required to exit the stem cell state, with distinct triggers characterizing maturation into the various different lineages. By focusing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1, which we confirmed experimentally thus validating our model. In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Reguladoras de Genes , Hematopoyesis/genética , Células Madre Hematopoyéticas/metabolismo , Modelos Genéticos , Animales , Línea Celular , Eritrocitos/citología , Genes Reguladores , Células Madre Hematopoyéticas/citología , Ratones , Factores de Transcripción/metabolismo
17.
Retrovirology ; 10: 102, 2013 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-24059682

RESUMEN

BACKGROUND: Current HIV-1 envelope glycoprotein (Env) vaccines are unable to induce cross-reactive neutralizing antibodies. However, such antibodies are elicited in 10-30% of HIV-1 infected individuals, but it is unknown why these antibodies are induced in some individuals and not in others. We hypothesized that the Envs of early HIV-1 variants in individuals who develop cross-reactive neutralizing activity (CrNA) might have unique characteristics that support the induction of CrNA. RESULTS: We retrospectively generated and analyzed env sequences of early HIV-1 clonal variants from 31 individuals with diverse levels of CrNA 2-4 years post-seroconversion. These sequences revealed a number of Env signatures that coincided with CrNA development. These included a statistically shorter variable region 1 and a lower probability of glycosylation as implied by a high ratio of NXS versus NXT glycosylation motifs. Furthermore, lower probability of glycosylation at position 332, which is involved in the epitopes of many broadly reactive neutralizing antibodies, was associated with the induction of CrNA. Finally, Sequence Harmony identified a number of amino acid changes associated with the development of CrNA. These residues mapped to various Env subdomains, but in particular to the first and fourth variable region as well as the underlying α2 helix of the third constant region. CONCLUSIONS: These findings imply that the development of CrNA might depend on specific characteristics of early Env. Env signatures that correlate with the induction of CrNA might be relevant for the design of effective HIV-1 vaccines.


Asunto(s)
Anticuerpos Neutralizantes/inmunología , Glicoproteínas/inmunología , Anticuerpos Anti-VIH/inmunología , VIH-1/inmunología , Productos del Gen env del Virus de la Inmunodeficiencia Humana/inmunología , Estudios de Cohortes , Reacciones Cruzadas , Glicoproteínas/genética , VIH-1/genética , Humanos , Productos del Gen env del Virus de la Inmunodeficiencia Humana/genética
18.
J Comput Chem ; 33(12): 1207-14, 2012 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-22370965

RESUMEN

We report on a python interface to the GROMACS molecular simulation package, GromPy (available at https://github.com/GromPy). This application programming interface (API) uses the ctypes python module that allows function calls to shared libraries, for example, written in C. To the best of our knowledge, this is the first reported interface to the GROMACS library that uses direct library calls. GromPy can be used for extending the current GROMACS simulation and analysis modes. In this work, we demonstrate that the interface enables hybrid Monte-Carlo/molecular dynamics (MD) simulations in the grand-canonical ensemble, a simulation mode that is currently not implemented in GROMACS. For this application, the interplay between GromPy and GROMACS requires only minor modifications of the GROMACS source code, not affecting the operation, efficiency, and performance of the GROMACS applications. We validate the grand-canonical application against MD in the canonical ensemble by comparison of equations of state. The results of the grand-canonical simulations are in complete agreement with MD in the canonical ensemble. The python overhead of the grand-canonical scheme is only minimal.


Asunto(s)
Simulación de Dinámica Molecular , Método de Montecarlo , Programas Informáticos
19.
Nucleic Acids Res ; 38(Web Server issue): W35-40, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20525785

RESUMEN

Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein-protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína , Programas Informáticos , Algoritmos , Internet , Proteínas Smad/química , Proteínas Smad/clasificación
20.
Sci Rep ; 12(1): 10487, 2022 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-35729253

RESUMEN

Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations-with data extension-reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein's functional properties of interest are only partially annotated.


Asunto(s)
Algoritmos , Proteínas , Proteínas/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA