Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
1.
BMC Bioinformatics ; 23(1): 4, 2022 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-34983371

RESUMO

MOTIVATION: Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. METHOD: We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. RESULTS AND CONCLUSION: The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.


Assuntos
Mineração de Dados , Processamento de Proteína Pós-Traducional , Humanos , Proteínas , PubMed
2.
BMC Cancer ; 15: 883, 2015 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-26553226

RESUMO

BACKGROUND: The development of androgen resistance is a major limitation to androgen deprivation treatment in prostate cancer. We have developed an in vitro model of androgen-resistance to characterise molecular changes occurring as androgen resistance evolves over time. Our aim is to understand biological network profiles of transcriptomic changes occurring during the transition to androgen-resistance and to validate these changes between our in vitro model and clinical datasets (paired samples before and after androgen-deprivation therapy of patients with advanced prostate cancer). METHODS: We established an androgen-independent subline from LNCaP cells by prolonged exposure to androgen-deprivation. We examined phenotypic profiles and performed RNA-sequencing. The reads generated were compared to human clinical samples and were analysed using differential expression, pathway analysis and protein-protein interaction networks. RESULTS: After 24 weeks of androgen-deprivation, LNCaP cells had increased proliferative and invasive behaviour compared to parental LNCaP, and its growth was no longer responsive to androgen. We identified key genes and pathways that overlap between our cell line and clinical RNA sequencing datasets and analysed the overlapping protein-protein interaction network that shared the same pattern of behaviour in both datasets. Mechanisms bypassing androgen receptor signalling pathways are significantly enriched. Several steroid hormone receptors are differentially expressed in both datasets. In particular, the progesterone receptor is significantly differentially expressed and is part of the interaction network disrupted in both datasets. Other signalling pathways commonly altered in prostate cancer, MAPK and PI3K-Akt pathways, are significantly enriched in both datasets. CONCLUSIONS: The overlap between the human and cell-line differential expression profiles and protein networks was statistically significant showing that the cell-line model reproduces molecular patterns observed in clinical castrate resistant prostate cancer samples, making this cell line a useful tool in understanding castrate resistant prostate cancer. Pathway analysis revealed similar patterns of enriched pathways from differentially expressed genes of both human clinical and cell line datasets. Our analysis revealed several potential mechanisms and network interactions, including cooperative behaviours of other nuclear receptors, in particular the subfamily of steroid hormone receptors such as PGR and alteration to gene expression in both the MAPK and PI3K-Akt signalling pathways.


Assuntos
Androgênios/uso terapêutico , Neoplasias de Próstata Resistentes à Castração/genética , Mapas de Interação de Proteínas/genética , Receptores Androgênicos/biossíntese , Receptores de Progesterona/biossíntese , Androgênios/metabolismo , Linhagem Celular Tumoral , Movimento Celular/genética , Proliferação de Células/genética , Sobrevivência Celular/genética , Regulação Neoplásica da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Quinases de Proteína Quinase Ativadas por Mitógeno/genética , Proteínas de Neoplasias/biossíntese , Fosfatidilinositol 3-Quinases/genética , Neoplasias de Próstata Resistentes à Castração/tratamento farmacológico , Neoplasias de Próstata Resistentes à Castração/patologia , Receptores Androgênicos/metabolismo , Receptores de Progesterona/metabolismo , Transdução de Sinais/genética
3.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30689846

RESUMO

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.


Assuntos
Mineração de Dados/métodos , Bases de Dados de Proteínas , Mutação , Medicina de Precisão/métodos , Mapas de Interação de Proteínas , Software , Biologia Computacional/métodos , Humanos , Mutação/genética , Mutação/fisiologia , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas/genética , Mapas de Interação de Proteínas/fisiologia
4.
Database (Oxford) ; 20182018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30576491

RESUMO

Precision medicine aims to provide personalized treatments based on individual patient profiles. One critical step towards precision medicine is leveraging knowledge derived from biomedical publications-a tremendous literature resource presenting the latest scientific discoveries on genes, mutations and diseases. Biomedical natural language processing (BioNLP) plays a vital role in supporting automation of this process. BioCreative VI Track 4 brings community effort to the task of automatically identifying and extracting protein-protein interactions (PPi) affected by mutations (PPIm), important in the precision medicine context for capturing individual genotype variation related to disease.We present the READ-BioMed team's approach to identifying PPIm-related publications and to extracting specific PPIm information from those publications in the context of the BioCreative VI PPIm track. We observe that current BioNLP tools are insufficient to recognise entities for these two tasks; the best existing mutation recognition tool achieves only 55% recall in the document triage training set, while relation extraction performance is limited by the low recall performance of gene entity recognition. We develop the models accordingly: for document triage, we develop term lists capturing interactions and mutations to complement BioNLP tools, and select effective features via a feature contribution study, whereas an ensemble of BioNLP tools is employed for relation extraction.Our best document triage model achieves an F-score of 66.77% while our best model for relation extraction achieved an F-score of 35.09% over the final (updated post-task) test set. Impacting the document triage task, the characteristics of mutations are statistically different in the training and testing sets. While a vital new direction for biomedical text mining research, this early attempt to tackle the problem of identifying genetic variation of substantial biological significance highlights the importance of representative training data and the cascading impact of tool limitations in a modular system.


Assuntos
Mineração de Dados/métodos , Informática Médica/métodos , Processamento de Linguagem Natural , Medicina de Precisão/métodos , Pesquisa Biomédica , Humanos , Mutação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA