Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Hum Mutat ; 39(12): 2025-2039, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30204945

RESUMO

The widespread use of next generation sequencing for clinical testing is detecting an escalating number of variants in noncoding regions of the genome. The clinical significance of the majority of these variants is currently unknown, which presents a significant clinical challenge. We have screened over 6,000 early-onset and/or familial breast cancer (BC) cases collected by the ENIGMA consortium for sequence variants in the 5' noncoding regions of BC susceptibility genes BRCA1 and BRCA2, and identified 141 rare variants with global minor allele frequency < 0.01, 76 of which have not been reported previously. Bioinformatic analysis identified a set of 21 variants most likely to impact transcriptional regulation, and luciferase reporter assays detected altered promoter activity for four of these variants. Electrophoretic mobility shift assays demonstrated that three of these altered the binding of proteins to the respective BRCA1 or BRCA2 promoter regions, including NFYA binding to BRCA1:c.-287C>T and PAX5 binding to BRCA2:c.-296C>T. Clinical classification of variants affecting promoter activity, using existing prediction models, found no evidence to suggest that these variants confer a high risk of disease. Further studies are required to determine if such variation may be associated with a moderate or low risk of BC.

2.
F1000Res ; 7: 233, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29904591

RESUMO

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2,  PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2,  CD8A,  TALDO1,  PCNA,  EIF4G2,  LCN2,  CDKN1A,  PRKCH,  ENO1,  and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

3.
F1000Res ; 7: 1908, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31275557

RESUMO

We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 351,423 of these validated mutations, the majority of which (69.1%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 117,951 unique mutations which weaken or abolish natural splice sites, and 244,415 mutations which strengthen cryptic splice sites (10,943 affect both simultaneously). 27,803 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public Beacon Network ( http://www.beacon-network.org/#/search?beacon=cytognomix), as well as through our website ( https://validsplicemut.cytognomix.com/).

4.
Breast Cancer Res Treat ; 165(3): 687-697, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28664506

RESUMO

PURPOSE: To characterize the spectrum of germline mutations in BRCA1, BRCA2, and PALB2 in population-based unselected breast cancer cases in an Asian population. METHODS: Germline DNA from 467 breast cancer patients in Sarawak General Hospital, Malaysia, where 93% of the breast cancer patients in Sarawak are treated, was sequenced for the entire coding region of BRCA1; BRCA2; PALB2; Exons 6, 7, and 8 of TP53; and Exons 7 and 8 of PTEN. Pathogenic variants included known pathogenic variants in ClinVar, loss of function variants, and variants that disrupt splice site. RESULTS: We found 27 pathogenic variants (11 BRCA1, 10 BRCA2, 4 PALB2, and 2 TP53) in 34 patients, which gave a prevalence of germline mutations of 2.8, 3.23, and 0.86% for BRCA1, BRCA2, and PALB2, respectively. Compared to mutation non-carriers, BRCA1 mutation carriers were more likely to have an earlier age at onset, triple-negative subtype, and lower body mass index, whereas BRCA2 mutation carriers were more likely to have a positive family history. Mutation carrier cases had worse survival compared to non-carriers; however, the association was mostly driven by stage and tumor subtype. We also identified 19 variants of unknown significance, and some of them were predicted to alter splicing or transcription factor binding sites. CONCLUSION: Our data provide insight into the genetics of breast cancer in this understudied group and suggest the need for modifying genetic testing guidelines for this population with a much younger age at diagnosis and more limited resources compared with Caucasian populations.


Assuntos
Neoplasias da Mama/epidemiologia , Neoplasias da Mama/genética , Proteína do Grupo de Complementação N da Anemia de Fanconi/genética , Genes BRCA1 , Genes BRCA2 , Predisposição Genética para Doença , Mutação em Linhagem Germinativa , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Biomarcadores Tumorais , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/terapia , Análise Mutacional de DNA , Feminino , Humanos , Malásia/epidemiologia , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Vigilância da População , Gravidez , Prevalência , Fatores de Risco , Adulto Jovem
5.
Nucleic Acids Res ; 45(5): e27, 2017 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-27899659

RESUMO

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes.


Assuntos
Teoria da Informação , Análise de Sequência com Séries de Oligonucleotídeos , Matrizes de Pontuação de Posição Específica , Fatores de Transcrição/metabolismo , Sítios de Ligação , Conjuntos de Dados como Assunto , Entropia , Genoma Humano , Células HeLa , Humanos , Células K562 , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Reprodutibilidade dos Testes , Fatores de Transcrição/genética
6.
BMC Med Genomics ; 9: 19, 2016 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-27067391

RESUMO

BACKGROUND: Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT) and prioritizing patients with large intragenic deletions. METHODS: We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk, anonymized patients without identified mutations in BRCA1/2. Aside from protein coding and copy number changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure. RESULTS: 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. An intragenic 32.1 kb interval in BRCA2 that was likely hemizygous was detected in one patient. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels). CONCLUSIONS: We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.


Assuntos
Neoplasias da Mama/genética , DNA Intergênico/genética , Predisposição Genética para Doença , Padrões de Herança/genética , Mutação/genética , Neoplasias Ovarianas/genética , Sequência de Bases , Éxons/genética , Feminino , Humanos , Teoria da Informação , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica/genética , Isoformas de Proteínas/genética , Sítios de Splice de RNA/genética , Alinhamento de Sequência , Análise de Sequência de DNA , Deleção de Sequência/genética , Regiões não Traduzidas/genética
7.
Hum Mutat ; 37(7): 640-52, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-26898890

RESUMO

BRCA1 and BRCA2 testing for hereditary breast and ovarian cancer (HBOC) does not identify all pathogenic variants. Sequencing of 20 complete genes in HBOC patients with uninformative test results (N = 287), including noncoding and flanking sequences of ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, EPCAM, MLH1, MRE11A, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51B, STK11, TP53, and XRCC2, identified 38,372 unique variants. We apply information theory (IT) to predict and prioritize noncoding variants of uncertain significance in regulatory, coding, and intronic regions based on changes in binding sites in these genes. Besides mRNA splicing, IT provides a common framework to evaluate potential affinity changes in transcription factor (TFBSs), splicing regulatory (SRBSs), and RNA-binding protein (RBBSs) binding sites following mutation. We prioritized variants affecting the strengths of 10 splice sites (four natural, six cryptic), 148 SRBS, 36 TFBS, and 31 RBBS. Three variants were also prioritized based on their predicted effects on mRNA secondary (2°) structure and 17 for pseudoexon activation. Additionally, four frameshift, two in-frame deletions, and five stop-gain mutations were identified. When combined with pedigree information, complete gene sequence analysis can focus attention on a limited set of variants in a wide spectrum of functional mutation types for downstream functional and co-segregation analysis.


Assuntos
Redes Reguladoras de Genes , Variação Genética , Síndrome Hereditária de Câncer de Mama e Ovário/genética , Proteína BRCA1/genética , Proteína BRCA2/genética , Feminino , Predisposição Genética para Doença , Humanos , Pessoa de Meia-Idade , Conformação de Ácido Nucleico , Processamento de RNA , RNA Mensageiro/química , RNA Mensageiro/genética , Análise de Sequência de DNA
8.
F1000Res ; 5: 2124, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28620450

RESUMO

Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients, was also used to derive gene signatures of other HT  (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing the ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, TUBB4B genes was 78.6% accurate in 84 patients treated with both HT and CT (median survival ≥ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches were also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of ABCB11, ABCC1, BAD, BBC3 and BCL2L1 was 79% accurate in 53 CT patients. A random forest (RF) classifier produced a gene signature ( ABCB11, ABCC1, BAD, BCL2, CYP2C8, CYP3A4, MAP4, MAPT, NR1I2, TUBB1, GBP1, OPRK1) that predicted >3 year survival with 82.4% accuracy in 420 HT patients. A similar RF gene signature showed 79.6% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies.

9.
F1000Res ; 3: 282, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25717368

RESUMO

The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.

10.
Genomics Proteomics Bioinformatics ; 11(2): 77-85, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23499923

RESUMO

Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. We present the Shannon pipeline software for genome-scale mutation analysis and provide evidence that the software predicts variants affecting mRNA splicing. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing variants are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in genomes of three cancer cell lines (U2OS, U251 and A431), which were supported by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised 6-17 inactivating mutations, 1-5 leaky mutations and 6-13 cryptic splicing mutations. Predicted effects were validated by RNA-seq analysis of the three aforementioned cancer cell lines, and expression microarray analysis of SNPs in HapMap cell lines.


Assuntos
Genoma Humano , Mutação , Processamento de RNA/genética , Software , Expressão Gênica , Humanos , Mutação Puntual , Polimorfismo de Nucleotídeo Único , RNA Mensageiro/genética , RNA Neoplásico/genética
11.
Hum Mutat ; 34(4): 557-65, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23348723

RESUMO

Mutations that affect mRNA splicing often produce multiple mRNA isoforms, resulting in complex molecular phenotypes. Definition of an exon and its inclusion in mature mRNA relies on joint recognition of both acceptor and donor splice sites. This study predicts cryptic and exon-skipping isoforms in mRNA produced by splicing mutations from the combined information contents (R(i), which measures binding-site strength, in bits) and distribution of the splice sites defining these exons. The total information content of an exon (R(i),total) is the sum of the R(i) values of its acceptor and donor splice sites, adjusted for the self-information of the distance separating these sites, that is, the gap surprisal. Differences between total information contents of an exon (ΔR(i,total)) are predictive of the relative abundance of these exons in distinct processed mRNAs. Constraints on splice site and exon selection are used to eliminate nonconforming and poorly expressed isoforms. Molecular phenotypes are computed by the Automated Splice Site and Exon Definition Analysis (http://splice.uwo.ca) server. Predictions of splicing mutations were highly concordant (85.2%; n = 61) with published expression data. In silico exon definition analysis will contribute to streamlining assessment of abnormal and normal splice isoforms resulting from mutations.


Assuntos
Biologia Computacional , Éxons , Mutação , Isoformas de RNA , Processamento de RNA , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Teoria da Informação , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , Reprodutibilidade dos Testes
12.
Hum Mutat ; 32(7): 735-42, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21523855

RESUMO

Variants of uncertain significance (VUS) in the BRCA1 and BRCA2 genes potentially affecting coding sequence as well as normal splicing activity have confounded predisposition testing in breast cancer. Here, we apply information theory to analyze BRCA1/2 mRNA splicing mutations categorized as VUS. The method was validated for 31 of 36 mutations known to cause missplicing in BRCA1/2 and all 26 that do not alter splicing. All single-nucleotide variants in the Breast Cancer Information Resource (BIC; Breast Cancer Information Core Database; http://research.nhgri.nih.gov/bic; last access June 1, 2010) were then analyzed. Information analysis is similar in sensitivity to other predictive methods; however, the thermodynamic basis of the theory also enables splice-site affinity to be determined accurately, which is important for assessing mutations that render natural splice sites partially functional and competition between cryptic and natural splice sites. We report 299 of 2,071 single-nucleotide BIC mutations that are predicted to significantly weaken natural sites and/or strengthen cryptic splice sites, 171 of which are not designated as splicing mutations in the database. Splicing alterations are predicted for 68 of 690 BRCA1 and 60 of 958 BRCA2 mutations designated as VUS. These analyses should be useful in prioritizing suspected mutations for downstream expression studies and for predicting aberrantly spliced isoforms generated by these mutations.


Assuntos
Processamento Alternativo/genética , Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias da Mama/genética , RNA Mensageiro/genética , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados Genéticas , Feminino , Variação Genética , Humanos , Teoria da Informação , Modelos Genéticos , Dados de Sequência Molecular , Mutação
13.
Biochem J ; 418(2): 391-401, 2009 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-18973475

RESUMO

hYVH1 [human orthologue of YVH1 (yeast VH1-related phosphatase)] is an atypical dual-specificity phosphatase that is widely conserved throughout evolution. Deletion studies in yeast have suggested a role for this phosphatase in regulating cell growth. However, the role of the human orthologue is unknown. The present study used MS to identify Hsp70 (heat-shock protein 70) as a novel hYVH1-binding partner. The interaction was confirmed using endogenous co-immunoprecipitation experiments and direct binding of purified proteins. Endogenous Hsp70 and hYVH1 proteins were also found to co-localize specifically to the perinuclear region in response to heat stress. Domain deletion studies revealed that the ATPase effector domain of Hsp70 and the zinc-binding domain of hYVH1 are required for the interaction, indicating that this association is not simply a chaperone-substrate complex. Thermal phosphatase assays revealed hYVH1 activity to be unaffected by heat and only marginally affected by non-reducing conditions, in contrast with the archetypical dual-specificity phosphatase VHR (VH1-related protein). In addition, Hsp70 is capable of increasing the phosphatase activity of hYVH1 towards an exogenous substrate under non-reducing conditions. Furthermore, the expression of hYVH1 repressed cell death induced by heat shock, H2O2 and Fas receptor activation but not cisplatin. Co-expression of hYVH1 with Hsp70 further enhanced cell survival. Meanwhile, expression of a catalytically inactive hYVH1 or a hYVH1 variant that is unable to interact with Hsp70 failed to protect cells from the various stress conditions. The results suggest that hYVH1 is a novel cell survival phosphatase that co-operates with Hsp70 to positively affect cell viability in response to cellular insults.


Assuntos
Fosfatase 1 de Especificidade Dupla/metabolismo , Fosfatase 1 de Especificidade Dupla/fisiologia , Proteínas de Choque Térmico HSP70/metabolismo , Resposta ao Choque Térmico , Sequência de Aminoácidos , Morte Celular/genética , Morte Celular/fisiologia , Sobrevivência Celular/genética , Células Cultivadas , Fosfatase 1 de Especificidade Dupla/química , Fosfatase 1 de Especificidade Dupla/genética , Fosfatases de Especificidade Dupla/química , Fosfatases de Especificidade Dupla/genética , Fosfatases de Especificidade Dupla/metabolismo , Fosfatases de Especificidade Dupla/fisiologia , Células HeLa , Resposta ao Choque Térmico/fisiologia , Humanos , Chaperonas Moleculares/metabolismo , Chaperonas Moleculares/fisiologia , Ligação Proteica/fisiologia , Domínios e Motivos de Interação entre Proteínas , Transfecção
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA