Search | VHL Regional Portal

Domain landscapes of somatic mutations in cancer.

Nehrt, Nathan L; Peterson, Thomas A; Park, DoHwan; Kann, Maricel G.

BMC Genomics ; 13 Suppl 4: S9, 2012 Jun 18.

Article in English | MEDLINE | ID: mdl-22759657

ABSTRACT

BACKGROUND: Large-scale tumor sequencing projects are now underway to identify genetic mutations that drive tumor initiation and development. Most studies take a gene-based approach to identifying driver mutations, highlighting genes mutated in a large percentage of tumor samples as those likely to contain driver mutations. However, this gene-based approach usually does not consider the position of the mutation within the gene or the functional context the position of the mutation provides. Here we introduce a novel method for mapping mutations to distinct protein domains, not just individual genes, in which they occur, thus providing the functional context for how the mutation contributes to disease. Furthermore, aggregating mutations from all genes containing a specific protein domain enables the identification of mutations that are rare at the gene level, but that occur frequently within the specified domain. These highly mutated domains potentially reveal disruptions of protein function necessary for cancer development. RESULTS: We mapped somatic mutations from the protein coding regions of 100 colon adenocarcinoma tumor samples to the genes and protein domains in which they occurred, and constructed topographical maps to depict the "mutational landscapes" of gene and domain mutation frequencies. We found significant mutation frequency in a number of genes previously known to be somatically mutated in colon cancer patients including APC, TP53 and KRAS. In addition, we found significant mutation frequency within specific domains located in these genes, as well as within other domains contained in genes having low mutation frequencies. These domain "peaks" were enriched with functions important to cancer development including kinase activity, DNA binding and repair, and signal transduction. CONCLUSIONS: Using our method to create the domain landscapes of mutations in colon cancer, we were able to identify somatic mutations with high potential to drive cancer development. Interestingly, the majority of the genes involved have a low mutation frequency. Therefore, the method shows good potential for identifying rare driver mutations in current, large-scale tumor sequencing projects. In addition, mapping mutations to specific domains provides the necessary functional context for understanding how the mutations contribute to the disease, and may reveal novel or more refined gene and domain target regions for drug development.

Subject(s)

Computational Biology/methods , Neoplasms/genetics , Colonic Neoplasms/genetics , Humans , Mutation/genetics

Incorporating molecular and functional context into the analysis and prioritization of human variants associated with cancer.

Peterson, Thomas A; Nehrt, Nathan L; Park, Dohwan; Kann, Maricel G.

J Am Med Inform Assoc ; 19(2): 275-83, 2012.

Article in English | MEDLINE | ID: mdl-22319177

ABSTRACT

BACKGROUND AND OBJECTIVE: With recent breakthroughs in high-throughput sequencing, identifying deleterious mutations is one of the key challenges for personalized medicine. At the gene and protein level, it has proven difficult to determine the impact of previously unknown variants. A statistical method has been developed to assess the significance of disease mutation clusters on protein domains by incorporating domain functional annotations to assist in the functional characterization of novel variants. METHODS: Disease mutations aggregated from multiple databases were mapped to domains, and were classified as either cancer- or non-cancer-related. The statistical method for identifying significantly disease-associated domain positions was applied to both sets of mutations and to randomly generated mutation sets for comparison. To leverage the known function of protein domain regions, the method optionally distributes significant scores to associated functional feature positions. RESULTS: Most disease mutations are localized within protein domains and display a tendency to cluster at individual domain positions. The method identified significant disease mutation hotspots in both the cancer and non-cancer datasets. The domain significance scores (DS-scores) for cancer form a bimodal distribution with hotspots in oncogenes forming a second peak at higher DS-scores than non-cancer, and hotspots in tumor suppressors have scores more similar to non-cancers. In addition, on an independent mutation benchmarking set, the DS-score method identified mutations known to alter protein function with very high precision. CONCLUSION: By aggregating mutations with known disease association at the domain level, the method was able to discover domain positions enriched with multiple occurrences of deleterious mutations while incorporating relevant functional annotations. The method can be incorporated into translational bioinformatics tools to characterize rare and novel variants within large-scale sequencing studies.

Subject(s)

Mutation , Neoplasms/genetics , Protein Structure, Tertiary/genetics , Proteins/genetics , Databases, Protein , Disease/genetics , Humans , Proteins/chemistry

Bioinformatics for personal genome interpretation.

Capriotti, Emidio; Nehrt, Nathan L; Kann, Maricel G; Bromberg, Yana.

Brief Bioinform ; 13(4): 495-512, 2012 Jul.

Article in English | MEDLINE | ID: mdl-22247263

ABSTRACT

An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.

Subject(s)

Computational Biology/methods , Genetic Variation , Genome , Databases, Genetic , Genotype , Human Genome Project , Humans , Phenotype

Testing the ortholog conjecture with comparative functional genomic data from mammals.

Nehrt, Nathan L; Clark, Wyatt T; Radivojac, Predrag; Hahn, Matthew W.

PLoS Comput Biol ; 7(6): e1002073, 2011 Jun.

Article in English | MEDLINE | ID: mdl-21695233

ABSTRACT

A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the "ortholog conjecture"). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.

Subject(s)

Comparative Genomic Hybridization , Evolution, Molecular , Genes , Animals , Gene Dosage , Gene Expression Profiling , Humans , Mice , Oligonucleotide Array Sequence Analysis , Proteins/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL