Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
1.
Biomedicines ; 11(4)2023 Apr 19.
Article in English | MEDLINE | ID: mdl-37189830

ABSTRACT

The extracellular matrix (ECM) is earning an increasingly relevant role in many disease states and aging. The analysis of these disease states is possible with the GWAS and PheWAS methodologies, and through our analysis, we aimed to explore the relationships between polymorphisms in the compendium of ECM genes (i.e., matrisome genes) in various disease states. A significant contribution on the part of ECM polymorphisms is evident in various types of disease, particularly those in the core-matrisome genes. Our results confirm previous links to connective-tissue disorders but also unearth new and underexplored relationships with neurological, psychiatric, and age-related disease states. Through our analysis of the drug indications for gene-disease relationships, we identify numerous targets that may be repurposed for age-related pathologies. The identification of ECM polymorphisms and their contributions to disease will play an integral role in future therapeutic developments, drug repurposing, precision medicine, and personalized care.

3.
Syst Med (New Rochelle) ; 3(1): 22-35, 2020.
Article in English | MEDLINE | ID: mdl-32226924

ABSTRACT

The First International Conference in Systems and Network Medicine gathered together 200 global thought leaders, scientists, clinicians, academicians, industry and government experts, medical and graduate students, postdoctoral scholars and policymakers. Held at Georgetown University Conference Center in Washington D.C. on September 11-13, 2019, the event featured a day of pre-conference lectures and hands-on bioinformatic computational workshops followed by two days of deep and diverse scientific talks, panel discussions with eminent thought leaders, and scientific poster presentations. Topics ranged from: Systems and Network Medicine in Clinical Practice; the role of -omics technologies in Health Care; the role of Education and Ethics in Clinical Practice, Systems Thinking, and Rare Diseases; and the role of Artificial Intelligence in Medicine. The conference served as a unique nexus for interdisciplinary discovery and dialogue and fostered formation of new insights and possibilities for health care systems advances.

4.
Hum Mutat ; 40(9): 1612-1622, 2019 09.
Article in English | MEDLINE | ID: mdl-31241222

ABSTRACT

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.


Subject(s)
Breast Neoplasms/genetics , Checkpoint Kinase 2/genetics , Computational Biology/methods , Hispanic or Latino/genetics , Polymorphism, Single Nucleotide , Adult , Aged , Breast Neoplasms/ethnology , Case-Control Studies , Computer Simulation , Female , Genetic Predisposition to Disease , Humans , Linear Models , Middle Aged , United States/ethnology , Exome Sequencing
5.
Pac Symp Biocomput ; 24: 444-448, 2019.
Article in English | MEDLINE | ID: mdl-30864345

ABSTRACT

Identifying functional elements and predicting mechanistic insight from non-coding DNA and noncoding variation remains a challenge. Advances in genome-scale, high-throughput technology, however, have brought these answers closer within reach than ever, though there is still a need for new computational approaches to analysis and integration. This workshop aims to explore these resources and new computational methods applied to regulatory elements, chromatin interactions, non-protein-coding genes, and other non-coding DNA.


Subject(s)
Computational Biology/methods , DNA/genetics , High-Throughput Nucleotide Sequencing/statistics & numerical data , Sequence Analysis, DNA/statistics & numerical data , CRISPR-Cas Systems , Epigenesis, Genetic , Gene Regulatory Networks , Genetic Variation , Humans , Mutation , RNA, Untranslated/genetics , Regulatory Elements, Transcriptional , Systems Biology
6.
PLoS Comput Biol ; 14(11): e1006494, 2018 11.
Article in English | MEDLINE | ID: mdl-30408027

ABSTRACT

Research in computational biology has given rise to a vast number of methods developed to solve scientific problems. For areas in which many approaches exist, researchers have a hard time deciding which tool to select to address a scientific challenge, as essentially all publications introducing a new method will claim better performance than all others. Not all of these claims can be correct. Equally, for this same reason, developers struggle to demonstrate convincingly that they created a new and superior algorithm or implementation. Moreover, the developer community often has difficulty discerning which new approaches constitute true scientific advances for the field. The obvious answer to this conundrum is to develop benchmarks-meaning standard points of reference that facilitate evaluating the performance of different tools-allowing both users and developers to compare multiple tools in an unbiased fashion.


Subject(s)
Computational Biology/methods , Algorithms , Area Under Curve , Publications
7.
BMC Med Genomics ; 11(Suppl 3): 75, 2018 Sep 14.
Article in English | MEDLINE | ID: mdl-30255817

ABSTRACT

BACKGROUND: Understanding the effect of human genetic variations on disease can provide insight into phenotype-genotype relationships, and has great potential for improving the effectiveness of personalized medicine. While some genetic markers linked to disease susceptibility have been identified, a large number are still unknown. In this paper, we propose a pathway-based approach to extend disease-variant associations and find new molecular connections between genetic mutations and diseases. METHODS: We used a compilation of over 80,000 human genetic variants with known disease associations from databases including the Online Mendelian Inheritance in Man (OMIM), Clinical Variance database (ClinVar), Universal Protein Resource (UniProt), and Human Gene Mutation Database (HGMD). Furthermore, we used the Unified Medical Language System (UMLS) to normalize variant phenotype terminologies, mapping 87% of unique genetic variants to phenotypic disorder concepts. Lastly, variants were grouped by UMLS Medical Subject Heading (MeSH) identifiers to determine pathway enrichment in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. RESULTS: By linking KEGG pathways through underlying variant associations, we elucidated connections between the human genetic variant-based disease phenome and metabolic pathways, finding novel disease connections not otherwise detected through gene-level analysis. When looking at broader disease categories, our network analysis showed that large complex diseases, such as cancers, are highly linked by their common pathways. In addition, we found Cardiovascular Diseases and Skin and Connective Tissue Diseases to have the highest number of common pathways, among 35 significant main disease category (MeSH) pairings. CONCLUSIONS: This study constitutes an important contribution to extending disease-variant connections and new molecular links between diseases. Novel disease connections were made by disease-pathway associations not otherwise detected through single-gene analysis. For instance, we found that mutations in different genes associated to Noonan Syndrome and Essential Hypertension share a common pathway. This analysis also provides the foundation to build novel disease-drug networks through their underlying common metabolic pathways, thus enabling new diagnostic and therapeutic interventions.


Subject(s)
Computational Biology/methods , Disease/genetics , Genome, Human , Metabolic Networks and Pathways , Mutation , Phenotype , Protein Interaction Maps , Databases, Genetic , Gene Regulatory Networks , Humans , Software , Unified Medical Language System
8.
PLoS Comput Biol ; 13(4): e1005428, 2017 04.
Article in English | MEDLINE | ID: mdl-28426665

ABSTRACT

The fight against cancer is hindered by its highly heterogeneous nature. Genome-wide sequencing studies have shown that individual malignancies contain many mutations that range from those commonly found in tumor genomes to rare somatic variants present only in a small fraction of lesions. Such rare somatic variants dominate the landscape of genomic mutations in cancer, yet efforts to correlate somatic mutations found in one or few individuals with functional roles have been largely unsuccessful. Traditional methods for identifying somatic variants that drive cancer are 'gene-centric' in that they consider only somatic variants within a particular gene and make no comparison to other similar genes in the same family that may play a similar role in cancer. In this work, we present oncodomain hotspots, a new 'domain-centric' method for identifying clusters of somatic mutations across entire gene families using protein domain models. Our analysis confirms that our approach creates a framework for leveraging structural and functional information encapsulated by protein domains into the analysis of somatic variants in cancer, enabling the assessment of even rare somatic variants by comparison to similar genes. Our results reveal a vast landscape of somatic variants that act at the level of domain families altering pathways known to be involved with cancer such as protein phosphorylation, signaling, gene regulation, and cell metabolism. Due to oncodomain hotspots' unique ability to assess rare variants, we expect our method to become an important tool for the analysis of sequenced tumor genomes, complementing existing methods.


Subject(s)
Computational Biology/methods , Mutation/genetics , Neoplasms/genetics , Oncogene Proteins/genetics , Protein Domains/genetics , Databases, Protein , Epidermal Growth Factor/genetics , Humans , Mitochondrial Proteins/genetics , Models, Molecular , Oncogene Proteins/classification , Protein Binding , ras Proteins/genetics
9.
Neoplasia ; 19(2): 65-74, 2017 02.
Article in English | MEDLINE | ID: mdl-28038319

ABSTRACT

The semaphorins and the plexins are a family of large, cysteine-rich proteins originally identified as regulators of axon growth and lymphocyte activation that are now known to provide motility and positional information for a number of cell and tissue types. For example, our group and others have shown that some malignancies over express Semaphorin 4D (S4D), which acts through its receptor Plexin-B1 (PB1) on endothelial cells to attract blood vessels from the surrounding stroma for the purpose of supporting tumor growth. While plexins are the known functional receptors for the semaphorins, there is evidence that transmembrane semaphorins may transmit a signal themselves through their short cytoplasmic tail, a phenomenon known as 'reverse signaling.' We used computational methods based upon correlated evolution of sequences of interacting proteins, mutational analysis and in vitro and in vivo measurements of tumor aggressiveness to show that when bound to PB1, transmembrane S4D associates with the Rac GTPase exchange factor T lymphoma invasion and metastasis (Tiam) 1, which activates Rac and promotes proliferation, invasion and metastasis in oral squamous cell carcinoma (OSCC) cells. These results suggest that not only can S4D production by tumor cells affect the microenvironment, but engagement of this semaphorin at the cell surface activates a reverse signaling mechanism that influences tumor aggressiveness in OSCC.


Subject(s)
Antigens, CD/metabolism , Carcinoma, Squamous Cell/metabolism , Carcinoma, Squamous Cell/pathology , Guanine Nucleotide Exchange Factors/metabolism , Mouth Neoplasms/metabolism , Mouth Neoplasms/pathology , Semaphorins/metabolism , rac GTP-Binding Proteins/metabolism , Animals , Antigens, CD/chemistry , Biopsy , Carcinoma, Squamous Cell/mortality , Cell Line, Tumor , Cell Movement , Cell Proliferation , DNA-Binding Proteins , Disease Models, Animal , Gene Expression , Guanine Nucleotide Exchange Factors/chemistry , Humans , Mice , Mouth Neoplasms/mortality , Neoplasm Metastasis , Nuclear Proteins/metabolism , PDZ Domains , Prognosis , Protein Binding , Protein Interaction Domains and Motifs , Proteomics/methods , Semaphorins/chemistry , T-Lymphoma Invasion and Metastasis-inducing Protein 1 , Transcription Factors/metabolism
10.
Hum Mutat ; 37(11): 1137-1143, 2016 11.
Article in English | MEDLINE | ID: mdl-27406314

ABSTRACT

In silico methods for detecting functionally relevant genetic variants are important for identifying genetic markers of human inherited disease. Much research has focused on protein-coding variants since coding regions have well-defined physicochemical and functional properties. However, many bioinformatics tools are not applicable to variants outside coding regions. Here, we increase the classification performance of our regulatory single-nucleotide variant predictor (RSVP) for variants that cause regulatory abnormalities from an AUC of 0.90-0.97 by incorporating genomic regions identified by the ENCODE project into RSVP. RSVP is comparable to a recently published tool, Genome-Wide Annotation of Variants (GWAVA); both RSVP and GWAVA perform better on regulatory variants than a traditional variant predictor, combined annotation-dependent depletion (CADD). However, our method outperforms GWAVA on variants located at similar distances to the transcription start site as the positive set (AUC: 0.96) as compared with GWAVA (AUC: 0.71). Much of this disparity is due to RSVP's incorporation of features pertaining to the nearest gene (expression, GO terms, etc.), which are not included in GWAVA. Our findings hold out the promise of a framework for the assessment of all functional regulatory variants, providing a means to predict which rare or de novo variants are of pathogenic significance.


Subject(s)
Computational Biology/methods , Genomics/methods , Polymorphism, Single Nucleotide , Computer Simulation , Genetic Predisposition to Disease , Genome, Human , Humans , Machine Learning
11.
Sci Rep ; 5: 12085, 2015 Jul 10.
Article in English | MEDLINE | ID: mdl-26160052

ABSTRACT

Characterizing biomolecular interactions is crucial to the understanding of biological processes. Existing characterization methods have low spatial resolution, poor specificity, and some lack the capability for deep tissue imaging. We describe a novel technique that relies on small-angle X-ray scattering signatures from high-contrast molecular probes that correlate with the presence of biomolecular interactions. We describe a proof-of-concept study that uses a model system consisting of mixtures of monomer solutions of gold nanoparticles (GNPs) as the non-interacting species and solutions of GNP dimers linked with an organic molecule (dimethyl suberimidate) as the interacting species. We report estimates of the interaction fraction obtained with the proposed small-angle X-ray scattering characterization method exhibiting strong correlation with the known relative concentration of interacting and non-interacting species.


Subject(s)
Metal Nanoparticles/chemistry , Gold/chemistry , Models, Theoretical , Scattering, Small Angle , Solutions/chemistry , X-Ray Diffraction/methods , X-Rays
12.
J Biol Chem ; 290(35): 21642-51, 2015 Aug 28.
Article in English | MEDLINE | ID: mdl-26160172

ABSTRACT

Mac-1 exhibits a unique inhibitory activity toward IL-13-induced JAK/STAT activation and thereby regulates macrophage to foam cell transformation. However, the underlying molecular mechanism is unknown. In this study, we report the identification of IL-13Rα1, a component of the IL-13 receptor (IL-13R), as a novel ligand of integrin Mac-1, using a co-evolution-based algorithm. Biochemical analyses demonstrated that recombinant IL-13Rα1 binds Mac-1 in a purified system and supports Mac-1-mediated cell adhesion. Co-immunoprecipitation experiments revealed that endogenous Mac-1 forms a complex with IL-13Rα1 in solution, and confocal fluorescence microscopy demonstrated that these two receptors co-localize with each other on the surface of macrophages. Moreover, we found that genetic inactivation of Mac-1 promotes IL-13-induced JAK/STAT activation in macrophages, resulting in enhanced polarization along the alternative activation pathway. Importantly, we observed that Mac-1(-/-) macrophages exhibit increased expression of foam cell differentiation markers including 15-lipoxygenase and lectin-type oxidized LDL receptor-1 both in vitro and in vivo. Indeed, we found that Mac-1(-/-)LDLR(-/-) mice develop significantly more foam cells than control LDLR(-/-) mice, using an in vivo model of foam cell formation. Together, our data establish for the first time a molecular mechanism by which Mac-1 regulates the signaling activity of IL-13 in macrophages. This newly identified IL-13Rα1/Mac-1-dependent pathway may offer novel targets for therapeutic intervention in the future.


Subject(s)
Interleukin-13 Receptor alpha1 Subunit/metabolism , Interleukin-13/metabolism , Macrophage-1 Antigen/metabolism , Macrophages/metabolism , Animals , Biomarkers/metabolism , Cell Differentiation , Cell Membrane/metabolism , Cell Polarity , Evolution, Molecular , Foam Cells/cytology , Foam Cells/metabolism , Gene Silencing , Janus Kinases/metabolism , Macrophages/cytology , Mice, Inbred C57BL , Protein Binding , Recombinant Proteins/metabolism , STAT Transcription Factors/metabolism , Solutions
13.
Article in English | MEDLINE | ID: mdl-25246425

ABSTRACT

BACKGROUND: This article describes capture of biological information using a hybrid approach that combines natural language processing to extract biological entities and crowdsourcing with annotators recruited via Amazon Mechanical Turk to judge correctness of candidate biological relations. These techniques were applied to extract gene- mutation relations from biomedical abstracts with the goal of supporting production scale capture of gene-mutation-disease findings as an open source resource for personalized medicine. RESULTS: The hybrid system could be configured to provide good performance for gene-mutation extraction (precision ∼82%; recall ∼70% against an expert-generated gold standard) at a cost of $0.76 per abstract. This demonstrates that crowd labor platforms such as Amazon Mechanical Turk can be used to recruit quality annotators, even in an application requiring subject matter expertise; aggregated Turker judgments for gene-mutation relations exceeded 90% accuracy. Over half of the precision errors were due to mismatches against the gold standard hidden from annotator view (e.g., incorrect EntrezGene identifier or incorrect mutation position extracted), or incomplete task instructions (e.g., the need to exclude nonhuman mutations). CONCLUSIONS: The hybrid curation model provides a readily scalable cost-effective approach to curation, particularly if coupled with expert human review to filter precision errors. We plan to generalize the framework and make it available as open source software. DATABASE URL: http://www.mitre.org/publications/technical-papers/hybrid-curation-of-gene-mutation-relations-combining-automated.


Subject(s)
Crowdsourcing/methods , Data Curation/methods , Genetic Predisposition to Disease , Information Storage and Retrieval/methods , Mutation/genetics , Natural Language Processing , Computational Biology/methods , Crowdsourcing/economics , Data Curation/economics , Databases, Genetic , Genomics , Humans
14.
Article in English | MEDLINE | ID: mdl-24303323

ABSTRACT

The fight against cancer has been hindered by its highly heterogeneous nature. Recent genome-wide sequencing studies have shown that individual malignancies contain many mutations that range from those commonly found in tumor genomes to rare cancer somatic mutations present only in a small fraction of lesions. For instance, the genome of a colorectal cancer in one patient can have somewhere between 50 to 100 somatic mutations, but might share only 2 or 3 mutated genes with colorectal tumor genomes from other patients. Somatic mutations that are frequently found in tumor genomes often play a significant role in tumor development and are thus classified as cancer driver mutations. However, efforts to correlate somatic mutations found in one or few individual tumor genomes with critical functional roles in tumor development have so far been unsuccessful. In this paper, we analyze cancer somatic mutations from lung and other types of cancer patients using a new approach based on aggregation of mutational data at the protein domain level. Our preliminary analysis confirms that our approach creates a framework for leveraging structural genomics and evolution into the analysis of somatic cancer mutations. We found that by incorporating information about classification of proteins and protein sites we are able to detect novel clusters of cancer somatic mutations.

15.
J Mol Biol ; 425(21): 4047-63, 2013 Nov 01.
Article in English | MEDLINE | ID: mdl-23962656

ABSTRACT

Variations and similarities in our individual genomes are part of our history, our heritage, and our identity. Some human genomic variants are associated with common traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Furthermore, a better understanding of the molecular underpinning of disease can lead to development of new drug targets for precision medicine. Several resources have been designed for collecting and storing human genomic variations in highly structured, easily accessible databases. Unfortunately, a vast amount of information about these genetic variants and their functional and phenotypic associations is currently buried in the literature, only accessible by manual curation or sophisticated text text-mining technology to extract the relevant information. In addition, the low cost of sequencing technologies coupled with increasing computational power has enabled the development of numerous computational methodologies to predict the pathogenicity of human variants. This review provides a detailed comparison of current human variant resources, including HGMD, OMIM, ClinVar, and UniProt/Swiss-Prot, followed by an overview of the computational methods and techniques used to leverage the available data to predict novel deleterious variants. We expect these resources and tools to become the foundation for understanding the molecular details of genomic variants leading to disease, which in turn will enable the promise of precision medicine.


Subject(s)
Computational Biology/methods , Genetic Predisposition to Disease , Genetic Variation , Genome, Human , Sequence Analysis/methods , Humans
16.
BMC Genomics ; 14 Suppl 3: S5, 2013.
Article in English | MEDLINE | ID: mdl-23819456

ABSTRACT

BACKGROUND: The body of disease mutations with known phenotypic relevance continues to increase and is expected to do so even faster with the advent of new experimental techniques such as whole-genome sequencing coupled with disease association studies. However, genomic association studies are limited by the molecular complexity of the phenotype being studied and the population size needed to have adequate statistical power. One way to circumvent this problem, which is critical for the study of rare diseases, is to study the molecular patterns emerging from functional studies of existing disease mutations. Current gene-centric analyses to study mutations in coding regions are limited by their inability to account for the functional modularity of the protein. Previous studies of the functional patterns of known human disease mutations have shown a significant tendency to cluster at protein domain positions, namely position-based domain hotspots of disease mutations. However, the limited number of known disease mutations remains the main factor hindering the advancement of mutation studies at a functional level. In this paper, we address this problem by incorporating mutations known to be disruptive of phenotypes in other species. Focusing on two evolutionarily distant organisms, human and yeast, we describe the first inter-species analysis of mutations of phenotypic relevance at the protein domain level. RESULTS: The results of this analysis reveal that phenotypic mutations from yeast cluster at specific positions on protein domains, a characteristic previously revealed to be displayed by human disease mutations. We found over one hundred domain hotspots in yeast with approximately 50% in the exact same domain position as known human disease mutations. CONCLUSIONS: We describe an analysis using protein domains as a framework for transferring functional information by studying domain hotspots in human and yeast and relating phenotypic changes in yeast to diseases in human. This first-of-a-kind study of phenotypically relevant yeast mutations in relation to human disease mutations demonstrates the utility of a multi-species analysis for advancing the understanding of the relationship between genetic mutations and phenotypic changes at the organismal level.


Subject(s)
Computational Biology/methods , Evolution, Molecular , Genetic Diseases, Inborn/genetics , Mutation/genetics , Phenotype , Humans , Protein Structure, Tertiary/genetics , Species Specificity , Yeasts
17.
Pac Symp Biocomput ; 2013: 368-72, 2013.
Article in English | MEDLINE | ID: mdl-23424141

ABSTRACT

The biggest challenge for text and data mining is to truly impact the biomedical discovery process, enabling scientists to generate novel hypothesis to address the most crucial questions. Among a number of worthy submissions, we have selected six papers that exemplify advances in text and data mining methods that have a demonstrated impact on a wide range of applications. Work presented in this session includes data mining techniques applied to the discovery of 3-way genetic interactions and to the analysis of genetic data in the context of electronic medical records (EMRs), as well as an integrative approach that combines data from genetic (SNP) and transcriptomic (microarray) sources for clinical prediction. Text mining advances include a classification method to determine whether a published article contains pharmacological experiments relevant to drug-drug interactions, a fine-grained text mining approach for detecting the catalytic sites in proteins in the biomedical literature, and a method for automatically extending a taxonomy of health-related terms to integrate consumer-friendly synonyms for medical terminologies.


Subject(s)
Computational Biology , Data Mining , Computational Biology/methods , Data Mining/methods , Drug Interactions , Electronic Health Records , Humans , Terminology as Topic
18.
Pac Symp Biocomput ; : 445-50, 2013.
Article in English | MEDLINE | ID: mdl-23424148

ABSTRACT

Emerging technologies such as single cell gene expression analysis and single cell genome sequencing provide an unprecedented opportunity to quantitatively probe biological interactions at the single cell level. This new level of insight has begun to reveal a more accurate picture of cellular behavior, and to highlight the importance of understanding cellular variation in a wide range of biological contexts. The aim of this workshop is to bring together researchers working on identifying and modeling cell heterogeneity that arises by a variety of mechanisms, including but not limited to cell-to-cell noise, cell-state switches and cell differentiation, heterogeneity in immune responses, cancer evolution, and heterogeneity in disease progression.


Subject(s)
Models, Biological , Single-Cell Analysis/statistics & numerical data , Animals , Cell Differentiation , Cell Physiological Phenomena , Computational Biology , Gene Expression , Humans , Immunity, Cellular , Neoplasms/etiology
19.
BMC Genomics ; 13 Suppl 4: S9, 2012 Jun 18.
Article in English | MEDLINE | ID: mdl-22759657

ABSTRACT

BACKGROUND: Large-scale tumor sequencing projects are now underway to identify genetic mutations that drive tumor initiation and development. Most studies take a gene-based approach to identifying driver mutations, highlighting genes mutated in a large percentage of tumor samples as those likely to contain driver mutations. However, this gene-based approach usually does not consider the position of the mutation within the gene or the functional context the position of the mutation provides. Here we introduce a novel method for mapping mutations to distinct protein domains, not just individual genes, in which they occur, thus providing the functional context for how the mutation contributes to disease. Furthermore, aggregating mutations from all genes containing a specific protein domain enables the identification of mutations that are rare at the gene level, but that occur frequently within the specified domain. These highly mutated domains potentially reveal disruptions of protein function necessary for cancer development. RESULTS: We mapped somatic mutations from the protein coding regions of 100 colon adenocarcinoma tumor samples to the genes and protein domains in which they occurred, and constructed topographical maps to depict the "mutational landscapes" of gene and domain mutation frequencies. We found significant mutation frequency in a number of genes previously known to be somatically mutated in colon cancer patients including APC, TP53 and KRAS. In addition, we found significant mutation frequency within specific domains located in these genes, as well as within other domains contained in genes having low mutation frequencies. These domain "peaks" were enriched with functions important to cancer development including kinase activity, DNA binding and repair, and signal transduction. CONCLUSIONS: Using our method to create the domain landscapes of mutations in colon cancer, we were able to identify somatic mutations with high potential to drive cancer development. Interestingly, the majority of the genes involved have a low mutation frequency. Therefore, the method shows good potential for identifying rare driver mutations in current, large-scale tumor sequencing projects. In addition, mapping mutations to specific domains provides the necessary functional context for understanding how the mutations contribute to the disease, and may reveal novel or more refined gene and domain target regions for drug development.


Subject(s)
Computational Biology/methods , Neoplasms/genetics , Colonic Neoplasms/genetics , Humans , Mutation/genetics
20.
J Biomed Inform ; 45(5): 835-41, 2012 Oct.
Article in English | MEDLINE | ID: mdl-22683993

ABSTRACT

OBJECTIVES: To explore the notion of mutation-centric pharmacogenomic relation extraction and to evaluate our approach against reference pharmacogenomic relations. METHODS: From a corpus of MEDLINE abstracts relevant to genetic variation, we identify co-occurrences between drug mentions extracted using MetaMap and RxNorm, and genetic variants extracted by EMU. The recall of our approach is evaluated against reference relations curated manually in PharmGKB. We also reviewed a random sample of 180 relations in order to evaluate its precision. RESULTS: One crucial aspect of our strategy is the use of biological knowledge for identifying specific genetic variants in text, not simply gene mentions. On the 104 reference abstracts from PharmGKB, the recall of our mutation-centric approach is 33-46%. Applied to 282,000 abstracts from MEDLINE, our approach identifies pharmacogenomic relations in 4534 abstracts, with a precision of 65%. CONCLUSIONS: Compared to a relation-centric approach, our mutation-centric approach shows similar recall, but slightly lower precision. We show that both approaches have limited overlap in their results, but are complementary and can be used in combination. Rather than a solution for the automatic curation of pharmacogenomic knowledge, we see these high-throughput approaches as tools to assist biocurators in the identification of pharmacogenomic relations of interest from the published literature. This investigation also identified three challenging aspects of the extraction of pharmacogenomic relations, namely processing full-text articles, sequence validation of DNA variants and resolution of genetic variants to reference databases, such as dbSNP.


Subject(s)
Data Mining/methods , Databases, Genetic , Mutation , Pharmacogenetics/methods , Humans , Knowledge Bases , MEDLINE
SELECTION OF CITATIONS
SEARCH DETAIL
...