Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
Bioinformatics ; 40(1)2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38147362

RESUMEN

MOTIVATION: Up-to-date pathway knowledge is usually presented in scientific publications for human reading, making it difficult to utilize these resources for semantic integration and computational analysis of biological pathways. We here present an approach to mining knowledge graphs by combining manual curation with automated named entity recognition and automated relation extraction. This approach allows us to study pathway-related questions in detail, which we here show using the ketamine pathway, aiming to help improve understanding of the role of gut microbiota in the antidepressant effects of ketamine. RESULTS: The thus devised ketamine pathway 'KetPath' knowledge graph comprises five parts: (i) manually curated pathway facts from images; (ii) recognized named entities in biomedical texts; (iii) identified relations between named entities; (iv) our previously constructed microbiota and pre-/probiotics knowledge bases; and (v) multiple community-accepted public databases. We first assessed the performance of automated extraction of relations between named entities using the specially designed state-of-the-art tool BioKetBERT. The query results show that we can retrieve drug actions, pathway relations, co-occurring entities, and their relations. These results uncover several biological findings, such as various gut microbes leading to increased expression of BDNF, which may contribute to the sustained antidepressant effects of ketamine. We envision that the methods and findings from this research will aid researchers who wish to integrate and query data and knowledge from multiple biomedical databases and literature simultaneously. AVAILABILITY AND IMPLEMENTATION: Data and query protocols are available in the KetPath repository at https://dx.doi.org/10.5281/zenodo.8398941 and https://github.com/tingcosmos/KetPath.


Asunto(s)
Microbioma Gastrointestinal , Ketamina , Humanos , Ketamina/farmacología , Bases de Datos Factuales , Antidepresivos/farmacología , Neurotransmisores , Minería de Datos/métodos
2.
PLoS Comput Biol ; 18(12): e1010669, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36454728

RESUMEN

The ubiquitous availability of genome sequencing data explains the popularity of machine learning-based methods for the prediction of protein properties from their amino acid sequences. Over the years, while revising our own work, reading submitted manuscripts as well as published papers, we have noticed several recurring issues, which make some reported findings hard to understand and replicate. We suspect this may be due to biologists being unfamiliar with machine learning methodology, or conversely, machine learning experts may miss some of the knowledge needed to correctly apply their methods to proteins. Here, we aim to bridge this gap for developers of such methods. The most striking issues are linked to a lack of clarity: how were annotations of interest obtained; which benchmark metrics were used; how are positives and negatives defined. Others relate to a lack of rigor: If you sneak in structural information, your method is not sequence-based; if you compare your own model to "state-of-the-art," take the best methods; if you want to conclude that some method is better than another, obtain a significance estimate to support this claim. These, and other issues, we will cover in detail. These points may have seemed obvious to the authors during writing; however, they are not always clear-cut to the readers. We also expect many of these tips to hold for other machine learning-based applications in biology. Therefore, many computational biologists who develop methods in this particular subject will benefit from a concise overview of what to avoid and what to do instead.


Asunto(s)
Benchmarking , Aprendizaje Automático , Secuencia de Aminoácidos , Mapeo Cromosómico , Conocimiento
3.
Sci Rep ; 12(1): 18977, 2022 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-36347868

RESUMEN

Scientific publications present biological relationships but are structured for human reading, making it difficult to use this resource for semantic integration and querying. Existing databases, on the other hand, are well structured for automated analysis, but do not contain comprehensive biological knowledge. We devised an approach for constructing comprehensive knowledge graphs from these two types of resources and applied it to investigate relationships between pre-/probiotics and microbiota-gut-brain axis diseases. To this end, we created (i) a knowledge base, dubbed ppstatement, containing manually curated detailed annotations, and (ii) a knowledge base, called ppconcept, containing automatically annotated concepts. The resulting Pre-/Probiotics Knowledge Graph (PPKG) combines these two knowledge bases with three other public databases (i.e. MeSH, UMLS and SNOMED CT). To validate the performance of PPKG and to demonstrate the added value of integrating two knowledge bases, we created four biological query cases. The query cases demonstrate that we can retrieve co-occurring concepts of interest, and also that combining the two knowledge bases leads to more comprehensive query results than utilizing them separately. The PPKG enables users to pose research queries such as "which pre-/probiotics combinations may benefit depression?", potentially leading to novel biological insights.


Asunto(s)
Microbiota , Probióticos , Humanos , Eje Cerebro-Intestino , Reconocimiento de Normas Patrones Automatizadas , Bases del Conocimiento
4.
Sci Rep ; 12(1): 16047, 2022 09 26.
Artículo en Inglés | MEDLINE | ID: mdl-36163232

RESUMEN

Self-supervised language modeling is a rapidly developing approach for the analysis of protein sequence data. However, work in this area is heterogeneous and diverse, making comparison of models and methods difficult. Moreover, models are often evaluated only on one or two downstream tasks, making it unclear whether the models capture generally useful properties. We introduce the ProteinGLUE benchmark for the evaluation of protein representations: a set of seven per-amino-acid tasks for evaluating learned protein representations. We also offer reference code, and we provide two baseline models with hyperparameters specifically trained for these benchmarks. Pre-training was done on two tasks, masked symbol prediction and next sentence prediction. We show that pre-training yields higher performance on a variety of downstream tasks such as secondary structure and protein interaction interface prediction, compared to no pre-training. However, the larger base model does not outperform the smaller medium model. We expect the ProteinGLUE benchmark dataset introduced here, together with the two baseline pre-trained models and their performance evaluations, to be of great value to the field of protein sequence-based property prediction. Availability: code and datasets from https://github.com/ibivu/protein-glue .


Asunto(s)
Benchmarking , Proteínas , Secuencia de Aminoácidos , Aminoácidos/química , Procesamiento de Lenguaje Natural
5.
Sci Rep ; 12(1): 10487, 2022 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-35729253

RESUMEN

Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations-with data extension-reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein's functional properties of interest are only partially annotated.


Asunto(s)
Algoritmos , Proteínas , Proteínas/metabolismo
6.
Bioinformatics ; 38(8): 2111-2118, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35150231

RESUMEN

MOTIVATION: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS: We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Proteínas , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Nucleótidos , Biología Computacional/métodos
7.
Bioinformatics ; 37(20): 3421-3427, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-33974039

RESUMEN

MOTIVATION: Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen's epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. RESULTS: We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
BMC Mol Cell Biol ; 22(1): 23, 2021 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-33892639

RESUMEN

BACKGROUND: The SARS-CoV-2 virus, the causative agent of COVID-19, consists of an assembly of proteins that determine its infectious and immunological behavior, as well as its response to therapeutics. Major structural biology efforts on these proteins have already provided essential insights into the mode of action of the virus, as well as avenues for structure-based drug design. However, not all of the SARS-CoV-2 proteins, or regions thereof, have a well-defined three-dimensional structure, and as such might exhibit ambiguous, dynamic behaviour that is not evident from static structure representations, nor from molecular dynamics simulations using these structures. MAIN: We present a website ( https://bio2byte.be/sars2/ ) that provides protein sequence-based predictions of the backbone and side-chain dynamics and conformational propensities of these proteins, as well as derived early folding, disorder, ß-sheet aggregation, protein-protein interaction and epitope propensities. These predictions attempt to capture the inherent biophysical propensities encoded in the sequence, rather than context-dependent behaviour such as the final folded state. In addition, we provide the biophysical variation that is observed in homologous proteins, which gives an indication of the limits of their functionally relevant biophysical behaviour. CONCLUSION: The https://bio2byte.be/sars2/ website provides a range of protein sequence-based predictions for 27 SARS-CoV-2 proteins, enabling researchers to form hypotheses about their possible functional modes of action.


Asunto(s)
SARS-CoV-2/química , Proteínas Virales/química , Bases de Datos de Proteínas , Humanos , Acceso a Internet , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos , Proteínas Virales/metabolismo
9.
Health Inf Sci Syst ; 9(1): 3, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33262885

RESUMEN

Gut microbiota produce and modulate the production of neurotransmitters which have been implicated in mental disorders. Neurotransmitters may act as 'matchmaker' between gut microbiota imbalance and mental disorders. Most of the relevant research effort goes into the relationship between gut microbiota and neurotransmitters and the other between neurotransmitters and mental disorders, while few studies collect and analyze the dispersed research results in systematic ways. We therefore gather the dispersed results that in the existing studies into a structured knowledge base for identifying and predicting the potential relationships between gut microbiota and mental disorders. In this study, we propose to construct a gut microbiota knowledge graph for mental disorder, which named as MiKG4MD. It is extendable by linking to future ontologies by just adding new relationships between existing information and new entities. This extendibility is emphasized for the integration with existing popular ontologies/terminologies, e.g. UMLS, MeSH, and KEGG. We demonstrate the performance of MiKG4MD with three SPARQL query test cases. Results show that the MiKG4MD knowledge graph is an effective method to predict the relationships between gut microbiota and mental disorders.

10.
F1000Res ; 92020.
Artículo en Inglés | MEDLINE | ID: mdl-32566135

RESUMEN

Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Biología Computacional/organización & administración , Europa (Continente) , Genómica , Humanos , Proteínas
11.
Bioinformatics ; 36(7): 2142-2149, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31845959

RESUMEN

MOTIVATION: Genetic interaction (GI) patterns are characterized by the phenotypes of interacting single and double mutated gene pairs. Uncovering the regulatory mechanisms of GIs would provide a better understanding of their role in biological processes, diseases and drug response. Computational analyses can provide insights into the underpinning mechanisms of GIs. RESULTS: In this study, we present a framework for exhaustive modelling of GI patterns using Petri nets (PN). Four-node models were defined and generated on three levels with restrictions, to enable an exhaustive approach. Simulations suggest ∼5 million models of GIs. Generalizing these we propose putative mechanisms for the GI patterns, inversion and suppression. We demonstrate that exhaustive PN modelling enables reasoning about mechanisms of GIs when only the phenotypes of gene pairs are known. The framework can be applied to other GI or genetic regulatory datasets. AVAILABILITY AND IMPLEMENTATION: The framework is available at http://www.ibi.vu.nl/programs/ExhMod. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Oral Oncol ; 98: 8-12, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31521885

RESUMEN

In this era of information technology, big data analysis is entering biomedical sciences. But what is big data, where do they come from and what can we do with it? In this commentary, the main sources of big data are explained, especially in (head and neck) oncology. It also touches upon the need to integrate various sources of clinical, pathological and quality-of-life data. It discusses some initiatives in linking of such datasets on a nation-wide scale in the Netherlands. Finally, it touches upon important issues regarding governance, FAIRness of data and the need to bring into place the necessary infrastructures needed to fully exploit the full potential of big data sets in head and neck cancer.


Asunto(s)
Macrodatos , Informática Médica/métodos , Oncología Médica , Bases de Datos Factuales , Neoplasias de Cabeza y Cuello/epidemiología , Humanos , Difusión de la Información , Oncología Médica/métodos , Países Bajos/epidemiología , Medicina de Precisión/métodos , Calidad de la Atención de Salud
13.
Bioinformatics ; 35(24): 5315-5317, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31368486

RESUMEN

SUMMARY: PRALINE 2 is a toolkit for custom multiple sequence alignment workflows. It can be used to incorporate sequence annotations, such as secondary structure or (DNA) motifs, into the alignment scoring, as well as to customize many other aspects of a progressive multiple alignment workflow. AVAILABILITY AND IMPLEMENTATION: PRALINE 2 is implemented in Python and available as open source software on GitHub: https://github.com/ibivu/PRALINE/.


Asunto(s)
Programas Informáticos , ADN , Estructura Secundaria de Proteína , Alineación de Secuencia
14.
Bioinformatics ; 35(22): 4794-4796, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31116381

RESUMEN

MOTIVATION: Interpretation of ubiquitous protein sequence data has become a bottleneck in biomolecular research, due to a lack of structural and other experimental annotation data for these proteins. Prediction of protein interaction sites from sequence may be a viable substitute. We therefore recently developed a sequence-based random forest method for protein-protein interface prediction, which yielded a significantly increased performance than other methods on both homomeric and heteromeric protein-protein interactions. Here, we present a webserver that implements this method efficiently. RESULTS: With the aim of accelerating our previous approach, we obtained sequence conservation profiles by re-mastering the alignment of homologous sequences found by PSI-BLAST. This yielded a more than 10-fold speedup and at least the same accuracy, as reported previously for our method; these results allowed us to offer the method as a webserver. The web-server interface is targeted to the non-expert user. The input is simply a sequence of the protein of interest, and the output a table with scores indicating the likelihood of having an interaction interface at a certain position. As the method is sequence-based and not sensitive to the type of protein interaction, we expect this webserver to be of interest to many biological researchers in academia and in industry. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets are available at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Proteínas , Análisis de Secuencia de Proteína
15.
PLoS Comput Biol ; 15(5): e1007061, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-31083661

RESUMEN

Genetic interactions, a phenomenon whereby combinations of mutations lead to unexpected effects, reflect how cellular processes are wired and play an important role in complex genetic diseases. Understanding the molecular basis of genetic interactions is crucial for deciphering pathway organization as well as understanding the relationship between genetic variation and disease. Several hypothetical molecular mechanisms have been linked to different genetic interaction types. However, differences in genetic interaction patterns and their underlying mechanisms have not yet been compared systematically between different functional gene classes. Here, differences in the occurrence and types of genetic interactions are compared for two classes, gene-specific transcription factors (GSTFs) and signaling genes (kinases and phosphatases). Genome-wide gene expression data for 63 single and double deletion mutants in baker's yeast reveals that the two most common genetic interaction patterns are buffering and inversion. Buffering is typically associated with redundancy and is well understood. In inversion, genes show opposite behavior in the double mutant compared to the corresponding single mutants. The underlying mechanism is poorly understood. Although both classes show buffering and inversion patterns, the prevalence of inversion is much stronger in GSTFs. To decipher potential mechanisms, a Petri Net modeling approach was employed, where genes are represented as nodes and relationships between genes as edges. This allowed over 9 million possible three and four node models to be exhaustively enumerated. The models show that a quantitative difference in interaction strength is a strict requirement for obtaining inversion. In addition, this difference is frequently accompanied with a second gene that shows buffering. Taken together, these results provide a mechanistic explanation for inversion. Furthermore, the ability of transcription factors to differentially regulate expression of their targets provides a likely explanation why inversion is more prevalent for GSTFs compared to kinases and phosphatases.


Asunto(s)
Regulación de la Expresión Génica , Modelos Genéticos , Factores de Transcripción/metabolismo , Inversión Cromosómica , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas , Epistasis Genética , Genes Fúngicos , Estudios de Asociación Genética , Mutación , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crecimiento & desarrollo , Saccharomyces cerevisiae/metabolismo , Transducción de Señal/genética
16.
PLoS Comput Biol ; 14(11): e1006547, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30383764

RESUMEN

Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.


Asunto(s)
Secuencias de Aminoácidos , ADN/química , Proteínas/química , Alineación de Secuencia/normas , Algoritmos , Secuencia de Aminoácidos , Secuencia Conservada , VIH-1/química , Homología de Secuencia de Aminoácido , Productos del Gen env del Virus de la Inmunodeficiencia Humana/química
17.
Antiviral Res ; 158: 213-225, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30121196

RESUMEN

BACKGROUND: We aimed to identify HBc amino acid differences between subgroups of chronic hepatitis B (CHB) patients. METHODS: Deep sequencing of HBc was performed in samples of 89 CHB patients (42 HBeAg positive, 47 HBeAg negative). Amino acid types were compared using Sequence Harmony to identify subgroup specific sites between HBeAg-positive and -negative patients, and between patients with combined response and non-response to peginterferon/adefovir combination therapy. RESULTS: We identified 54 positions in HBc where the frequency of appearing amino acids was significantly different between HBeAg-positive and -negative patients. In HBeAg negative patients, 22 positions in HBc were identified which differed between patients with treatment response and those with non-response. The fraction non-consensus sequence on selected positions was significantly higher in HBeAg-negative patients, and was negatively correlated with HBV DNA and HBsAg levels. CONCLUSIONS: Sequence Harmony identified a number of amino acid changes associated with HBeAg-status and response to peginterferon/adefovir combination therapy.


Asunto(s)
Virus de la Hepatitis B/genética , Hepatitis B Crónica/virología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas del Núcleo Viral/genética , Adenina/análogos & derivados , Adenina/uso terapéutico , Adulto , Antivirales/uso terapéutico , ADN Viral , Quimioterapia Combinada , Femenino , Antígenos de Superficie de la Hepatitis B , Antígenos e de la Hepatitis B , Hepatitis B Crónica/tratamiento farmacológico , Humanos , Interferón-alfa/uso terapéutico , Modelos Lineales , Masculino , Persona de Mediana Edad , Modelos Moleculares , Organofosfonatos/uso terapéutico , Polietilenglicoles/uso terapéutico , Conformación Proteica , Proteínas Recombinantes/uso terapéutico , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología de Secuencia , Proteínas del Núcleo Viral/química
18.
Sci Rep ; 8(1): 7522, 2018 05 14.
Artículo en Inglés | MEDLINE | ID: mdl-29760449

RESUMEN

Hyperactivation of Wnt and Ras-MAPK signalling are common events in development of colorectal adenomas. Further progression from adenoma-to-carcinoma is frequently associated with 20q gain and overexpression of Aurora kinase A (AURKA). Interestingly, AURKA has been shown to further enhance Wnt and Ras-MAPK signalling. However, the molecular details of these interactions in driving colorectal carcinogenesis remain poorly understood. Here we first performed differential expression analysis (DEA) of AURKA knockdown in two colorectal cancer (CRC) cell lines with 20q gain and AURKA overexpression. Next, using an exact algorithm, Heinz, we computed the largest connected protein-protein interaction (PPI) network module of significantly deregulated genes in the two CRC cell lines. The DEA and the Heinz analyses suggest 20 Wnt and Ras-MAPK signalling genes being deregulated by AURKA, whereof ß-catenin and KRAS occurred in both cell lines. Finally, shortest path analysis over the PPI network revealed eight 'connecting genes' between AURKA and these Wnt and Ras-MAPK signalling genes, of which UBE2D1, DICER1, CDK6 and RACGAP1 occurred in both cell lines. This study, first, confirms that AURKA influences deregulation of Wnt and Ras-MAPK signalling genes, and second, suggests mechanisms in CRC cell lines describing these interactions.


Asunto(s)
Aurora Quinasa A/genética , Aurora Quinasa A/metabolismo , Neoplasias Colorrectales/metabolismo , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Algoritmos , Células CACO-2 , Línea Celular Tumoral , Cromosomas Humanos Par 20/genética , Neoplasias Colorrectales/genética , Regulación Neoplásica de la Expresión Génica , Técnicas de Silenciamiento del Gen , Humanos , Sistema de Señalización de MAP Quinasas , Mapas de Interacción de Proteínas , Vía de Señalización Wnt , Proteínas ras/metabolismo
19.
Bioinformatics ; 33(10): 1479-1487, 2017 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-28073761

RESUMEN

MOTIVATION: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in protein-protein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction. In this paper, we evaluate the importance of various features using Random Forest (RF), and include as a novel feature backbone flexibility predicted from sequences to further optimise protein interface prediction. RESULTS: We observe that there is no single sequence feature that enables pinpointing interacting sites in our Random Forest models. However, combining different properties does increase the performance of interface prediction. Our homomeric-trained RF interface predictor is able to distinguish interface from non-interface residues with an area under the ROC curve of 0.72 in a homomeric test-set. The heteromeric-trained RF interface predictor performs better than existing predictors on a independent heteromeric test-set. We trained a more general predictor on the combined homomeric and heteromeric dataset, and show that in addition to predicting homomeric interfaces, it is also able to pinpoint interface residues in heterodimers. This suggests that our random forest model and the features included capture common properties of both homodimer and heterodimer interfaces. AVAILABILITY AND IMPLEMENTATION: The predictors and test datasets used in our analyses are freely available ( http://www.ibi.vu.nl/downloads/RF_PPI/ ). CONTACT: k.a.feenstra@vu.nl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Modelos Estadísticos , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Multimerización de Proteína , Biología Computacional/métodos , Curva ROC , Análisis de Secuencia de Proteína/métodos
20.
Bioinformatics ; 32(12): i60-i69, 2016 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-27307645

RESUMEN

MOTIVATION: Biological pathways play a key role in most cellular functions. To better understand these functions, diverse computational and cell biology researchers use biological pathway data for various analysis and modeling purposes. For specifying these biological pathways, a community of researchers has defined BioPAX and provided various tools for creating, validating and visualizing BioPAX models. However, a generic software framework for simulating BioPAX models is missing. Here, we attempt to fill this gap by introducing a generic simulation framework for BioPAX. The framework explicitly separates the execution model from the model structure as provided by BioPAX, with the advantage that the modelling process becomes more reproducible and intrinsically more modular; this ensures natural biological constraints are satisfied upon execution. The framework is based on the principles of discrete event systems and multi-agent systems, and is capable of automatically generating a hierarchical multi-agent system for a given BioPAX model. RESULTS: To demonstrate the applicability of the framework, we simulated two types of biological network models: a gene regulatory network modeling the haematopoietic stem cell regulators and a signal transduction network modeling the Wnt/ß-catenin signaling pathway. We observed that the results of the simulations performed using our framework were entirely consistent with the simulation results reported by the researchers who developed the original models in a proprietary language. AVAILABILITY AND IMPLEMENTATION: The framework, implemented in Java, is open source and its source code, documentation and tutorial are available at http://www.ibi.vu.nl/programs/BioASF CONTACT: j.heringa@vu.nl.


Asunto(s)
Redes Reguladoras de Genes , Modelos Biológicos , Transducción de Señal , Programas Informáticos , Simulación por Computador , Humanos , Lenguajes de Programación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...