Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinform Adv ; 2(1): vbac034, 2022.
Article in English | MEDLINE | ID: mdl-36699362

ABSTRACT

Summary: Gilda is a software tool and web service that implements a scored string matching algorithm for names and synonyms across entries in biomedical ontologies covering genes, proteins (and their families and complexes), small molecules, biological processes and diseases. Gilda integrates machine-learned disambiguation models to choose between ambiguous strings given relevant surrounding text as context, and supports species-prioritization in case of ambiguity. Availability and implementation: The Gilda web service is available at http://grounding.indra.bio with source code, documentation and tutorials available via https://github.com/indralab/gilda. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

2.
BMC Genomics ; 21(1): 773, 2020 Nov 10.
Article in English | MEDLINE | ID: mdl-33167858

ABSTRACT

BACKGROUND: Information on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation. RESULTS: Our system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score. CONCLUSIONS: The performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods.


Subject(s)
Data Mining , Natural Language Processing , Protein Interaction Mapping , Machine Learning , Mutation
4.
Database (Oxford) ; 20192019 01 01.
Article in English | MEDLINE | ID: mdl-30689846

ABSTRACT

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.


Subject(s)
Data Mining/methods , Databases, Protein , Mutation , Precision Medicine/methods , Protein Interaction Maps , Software , Computational Biology/methods , Humans , Mutation/genetics , Mutation/physiology , Protein Interaction Mapping , Protein Interaction Maps/genetics , Protein Interaction Maps/physiology
5.
Sci Rep ; 7(1): 4747, 2017 07 06.
Article in English | MEDLINE | ID: mdl-28684774

ABSTRACT

Drug and xenobiotic metabolizing enzymes (DXME) play important roles in drug responses and carcinogenesis. Recent studies have found that expression of DXME in cancer cells significantly affects drug clearance and the onset of drug resistance. In this study we compared the expression of DXME in breast tumor tissue samples from patients representing three ethnic groups: Caucasian Americans (CA), African Americans (AA), and Asian Americans (AS). We further combined DXME gene expression data with eQTL data from the GTEx project and with allele frequency data from the 1000 Genomes project to identify SNPs that may be associated with differential expression of DXME genes. We identified substantial differences among CA, AA, and AS populations in the expression of DXME genes and in activation of pathways involved in drug metabolism, including those involved in metabolizing chemotherapy drugs that are commonly used in the treatment of breast cancer. These data suggest that differential expression of DXME may associate with health disparities in breast cancer outcomes observed among these three ethnic groups. Our study suggests that development of personalized treatment strategies for breast cancer patients could be improved by considering both germline genotypes and tumor specific mutations and expression profiles related to DXME genes.


Subject(s)
Antineoplastic Agents/metabolism , Breast Neoplasms/genetics , Cytochrome P-450 Enzyme System/genetics , Gene Expression Regulation, Neoplastic , Inactivation, Metabolic/genetics , Neoplasm Proteins/genetics , Alleles , Antineoplastic Agents/therapeutic use , Asian People , Black People , Breast Neoplasms/drug therapy , Breast Neoplasms/enzymology , Breast Neoplasms/ethnology , Cytochrome P-450 Enzyme System/classification , Cytochrome P-450 Enzyme System/metabolism , Databases, Factual , Female , Gene Frequency , Healthcare Disparities , Humans , Neoplasm Proteins/classification , Neoplasm Proteins/metabolism , Neoplasm Staging , Precision Medicine , Treatment Outcome , White People , Xenobiotics/metabolism , Xenobiotics/therapeutic use
6.
Cancer Res ; 77(2): 423-433, 2017 01 15.
Article in English | MEDLINE | ID: mdl-28069798

ABSTRACT

Asian Americans (AS) have significantly lower incidence and mortality rates of breast cancer than Caucasian Americans (CA). Although this racial disparity has been documented, the underlying pathogenetic factors explaining it are obscure. We addressed this issue by an integrative genomics approach to compare mRNA expression between AS and CA cases of breast cancer. RNA-seq data from the Cancer Genome Atlas showed that mRNA expression revealed significant differences at gene and pathway levels. Increased susceptibility and severity in CA patients were likely the result of synergistic environmental and genetic risk factors, with arachidonic acid metabolism and PPAR signaling pathways implicated in linking environmental and genetic factors. An analysis that also added eQTL data from the Genotype-Tissue Expression Project and SNP data from the 1,000 Genomes Project identified several SNPs associated with differentially expressed genes. Overall, the associations we identified may enable a more focused study of genotypic differences that may help explain the disparity in breast cancer incidence and mortality rates in CA and AS populations and inform precision medicine. Cancer Res; 77(2); 423-33. ©2016 AACR.


Subject(s)
Breast Neoplasms/ethnology , Breast Neoplasms/genetics , Precision Medicine/methods , RNA, Messenger/genetics , Adult , Aged , Asian/genetics , Female , Gene Expression Profiling , Gene Regulatory Networks , Genetic Predisposition to Disease , Genome-Wide Association Study , Genotype , High-Throughput Nucleotide Sequencing , Humans , Middle Aged , Polymorphism, Single Nucleotide , RNA, Messenger/analysis , Transcriptome , White People/genetics
7.
Nucleic Acids Res ; 42(Web Server issue): W377-81, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24831547

ABSTRACT

Comparison of ribonucleic acid (RNA) molecules is important for revealing their evolutionary relationships, predicting their functions and predicting their structures. Many methods have been developed for comparing RNAs using either sequence or three-dimensional (3D) structure (backbone geometry) information. Sequences and 3D structures contain non-overlapping sets of information that both determine RNA functions. When comparing RNA 3D structures, both types of information need to be taken into account. However, few methods compare RNA structures using both sequence and 3D structure information. Recently, we have developed a new method based on elastic shape analysis (ESA) that compares RNA molecules by combining both sequence and 3D structure information. ESA treats RNA structures as 3D curves with sequence information encoded on additional coordinates so that the alignment can be performed in the joint sequence-structure space. The similarity between two RNA molecules is quantified by a formal distance, geodesic distance. In this study, we implement a web server for the method, called RASS, to make it publicly available to research community. The web server is located at http://cloud.stat.fsu.edu/RASS/.


Subject(s)
RNA/chemistry , Software , Internet , Nucleic Acid Conformation , Sequence Alignment , Sequence Analysis, RNA
SELECTION OF CITATIONS
SEARCH DETAIL