Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 45(5): e27, 2017 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-27899659

RESUMEN

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes.


Asunto(s)
Teoría de la Información , Análisis de Secuencia por Matrices de Oligonucleótidos , Posición Específica de Matrices de Puntuación , Factores de Transcripción/metabolismo , Sitios de Unión , Conjuntos de Datos como Asunto , Entropía , Genoma Humano , Células HeLa , Humanos , Células K562 , Motivos de Nucleótidos , Polimorfismo de Nucleótido Simple , Unión Proteica , Reproducibilidad de los Resultados , Factores de Transcripción/genética
2.
Hum Mutat ; 39(12): 2025-2039, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30204945

RESUMEN

The widespread use of next generation sequencing for clinical testing is detecting an escalating number of variants in noncoding regions of the genome. The clinical significance of the majority of these variants is currently unknown, which presents a significant clinical challenge. We have screened over 6,000 early-onset and/or familial breast cancer (BC) cases collected by the ENIGMA consortium for sequence variants in the 5' noncoding regions of BC susceptibility genes BRCA1 and BRCA2, and identified 141 rare variants with global minor allele frequency < 0.01, 76 of which have not been reported previously. Bioinformatic analysis identified a set of 21 variants most likely to impact transcriptional regulation, and luciferase reporter assays detected altered promoter activity for four of these variants. Electrophoretic mobility shift assays demonstrated that three of these altered the binding of proteins to the respective BRCA1 or BRCA2 promoter regions, including NFYA binding to BRCA1:c.-287C>T and PAX5 binding to BRCA2:c.-296C>T. Clinical classification of variants affecting promoter activity, using existing prediction models, found no evidence to suggest that these variants confer a high risk of disease. Further studies are required to determine if such variation may be associated with a moderate or low risk of BC.


Asunto(s)
Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias de la Mama/genética , Mutación de Línea Germinal , Regiones Promotoras Genéticas , Regiones no Traducidas 5' , Edad de Inicio , Proteína BRCA1/química , Proteína BRCA1/metabolismo , Proteína BRCA2/química , Proteína BRCA2/metabolismo , Factor de Unión a CCAAT/metabolismo , Línea Celular Tumoral , Femenino , Predisposición Genética a la Enfermedad , Humanos , Células MCF-7 , Factor de Transcripción PAX5/metabolismo , Unión Proteica
3.
Hum Mol Genet ; 24(18): 5345-55, 2015 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-26130695

RESUMEN

Numerous genetic factors that influence breast cancer risk are known. However, approximately two-thirds of the overall familial risk remain unexplained. To determine whether some of the missing heritability is due to rare variants conferring high to moderate risk, we tested for an association between the c.5791C>T nonsense mutation (p.Arg1931*; rs144567652) in exon 22 of FANCM gene and breast cancer. An analysis of genotyping data from 8635 familial breast cancer cases and 6625 controls from different countries yielded an association between the c.5791C>T mutation and breast cancer risk [odds ratio (OR) = 3.93 (95% confidence interval (CI) = 1.28-12.11; P = 0.017)]. Moreover, we performed two meta-analyses of studies from countries with carriers in both cases and controls and of all available data. These analyses showed breast cancer associations with OR = 3.67 (95% CI = 1.04-12.87; P = 0.043) and OR = 3.33 (95% CI = 1.09-13.62; P = 0.032), respectively. Based on information theory-based prediction, we established that the mutation caused an out-of-frame deletion of exon 22, due to the creation of a binding site for the pre-mRNA processing protein hnRNP A1. Furthermore, genetic complementation analyses showed that the mutation influenced the DNA repair activity of the FANCM protein. In summary, we provide evidence for the first time showing that the common p.Arg1931* loss-of-function variant in FANCM is a risk factor for familial breast cancer.


Asunto(s)
Empalme Alternativo , Codón sin Sentido , ADN Helicasas/genética , Reparación del ADN , Exones , Adulto , Edad de Inicio , Alelos , Sitios de Unión , Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/genética , Estudios de Casos y Controles , ADN Helicasas/metabolismo , Análisis Mutacional de ADN , Femenino , Expresión Génica , Frecuencia de los Genes , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Genotipo , Ribonucleoproteína Nuclear Heterogénea A1 , Ribonucleoproteína Heterogénea-Nuclear Grupo A-B/metabolismo , Humanos , Metaanálisis como Asunto , Persona de Mediana Edad , Motivos de Nucleótidos , Posición Específica de Matrices de Puntuación , Unión Proteica , Factores de Riesgo , Adulto Joven
4.
Breast Cancer Res Treat ; 165(3): 687-697, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28664506

RESUMEN

PURPOSE: To characterize the spectrum of germline mutations in BRCA1, BRCA2, and PALB2 in population-based unselected breast cancer cases in an Asian population. METHODS: Germline DNA from 467 breast cancer patients in Sarawak General Hospital, Malaysia, where 93% of the breast cancer patients in Sarawak are treated, was sequenced for the entire coding region of BRCA1; BRCA2; PALB2; Exons 6, 7, and 8 of TP53; and Exons 7 and 8 of PTEN. Pathogenic variants included known pathogenic variants in ClinVar, loss of function variants, and variants that disrupt splice site. RESULTS: We found 27 pathogenic variants (11 BRCA1, 10 BRCA2, 4 PALB2, and 2 TP53) in 34 patients, which gave a prevalence of germline mutations of 2.8, 3.23, and 0.86% for BRCA1, BRCA2, and PALB2, respectively. Compared to mutation non-carriers, BRCA1 mutation carriers were more likely to have an earlier age at onset, triple-negative subtype, and lower body mass index, whereas BRCA2 mutation carriers were more likely to have a positive family history. Mutation carrier cases had worse survival compared to non-carriers; however, the association was mostly driven by stage and tumor subtype. We also identified 19 variants of unknown significance, and some of them were predicted to alter splicing or transcription factor binding sites. CONCLUSION: Our data provide insight into the genetics of breast cancer in this understudied group and suggest the need for modifying genetic testing guidelines for this population with a much younger age at diagnosis and more limited resources compared with Caucasian populations.


Asunto(s)
Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/genética , Proteína del Grupo de Complementación N de la Anemia de Fanconi/genética , Genes BRCA1 , Genes BRCA2 , Predisposición Genética a la Enfermedad , Mutación de Línea Germinal , Adulto , Anciano , Anciano de 80 o más Años , Alelos , Biomarcadores de Tumor , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/terapia , Análisis Mutacional de ADN , Femenino , Humanos , Malasia/epidemiología , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Vigilancia de la Población , Embarazo , Prevalencia , Factores de Riesgo , Adulto Joven
5.
Hum Mutat ; 37(7): 640-52, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-26898890

RESUMEN

BRCA1 and BRCA2 testing for hereditary breast and ovarian cancer (HBOC) does not identify all pathogenic variants. Sequencing of 20 complete genes in HBOC patients with uninformative test results (N = 287), including noncoding and flanking sequences of ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, EPCAM, MLH1, MRE11A, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51B, STK11, TP53, and XRCC2, identified 38,372 unique variants. We apply information theory (IT) to predict and prioritize noncoding variants of uncertain significance in regulatory, coding, and intronic regions based on changes in binding sites in these genes. Besides mRNA splicing, IT provides a common framework to evaluate potential affinity changes in transcription factor (TFBSs), splicing regulatory (SRBSs), and RNA-binding protein (RBBSs) binding sites following mutation. We prioritized variants affecting the strengths of 10 splice sites (four natural, six cryptic), 148 SRBS, 36 TFBS, and 31 RBBS. Three variants were also prioritized based on their predicted effects on mRNA secondary (2°) structure and 17 for pseudoexon activation. Additionally, four frameshift, two in-frame deletions, and five stop-gain mutations were identified. When combined with pedigree information, complete gene sequence analysis can focus attention on a limited set of variants in a wide spectrum of functional mutation types for downstream functional and co-segregation analysis.


Asunto(s)
Redes Reguladoras de Genes , Variación Genética , Síndrome de Cáncer de Mama y Ovario Hereditario/genética , Proteína BRCA1/genética , Proteína BRCA2/genética , Femenino , Predisposición Genética a la Enfermedad , Humanos , Persona de Mediana Edad , Conformación de Ácido Nucleico , Empalme del ARN , ARN Mensajero/química , ARN Mensajero/genética , Análisis de Secuencia de ADN
6.
Hum Mutat ; 34(4): 557-65, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23348723

RESUMEN

Mutations that affect mRNA splicing often produce multiple mRNA isoforms, resulting in complex molecular phenotypes. Definition of an exon and its inclusion in mature mRNA relies on joint recognition of both acceptor and donor splice sites. This study predicts cryptic and exon-skipping isoforms in mRNA produced by splicing mutations from the combined information contents (R(i), which measures binding-site strength, in bits) and distribution of the splice sites defining these exons. The total information content of an exon (R(i),total) is the sum of the R(i) values of its acceptor and donor splice sites, adjusted for the self-information of the distance separating these sites, that is, the gap surprisal. Differences between total information contents of an exon (ΔR(i,total)) are predictive of the relative abundance of these exons in distinct processed mRNAs. Constraints on splice site and exon selection are used to eliminate nonconforming and poorly expressed isoforms. Molecular phenotypes are computed by the Automated Splice Site and Exon Definition Analysis (http://splice.uwo.ca) server. Predictions of splicing mutations were highly concordant (85.2%; n = 61) with published expression data. In silico exon definition analysis will contribute to streamlining assessment of abnormal and normal splice isoforms resulting from mutations.


Asunto(s)
Biología Computacional , Exones , Mutación , Isoformas de ARN , Empalme del ARN , Algoritmos , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Teoría de la Información , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos , Reproducibilidad de los Resultados
7.
Radiat Prot Dosimetry ; 199(14): 1465-1471, 2023 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-37721084

RESUMEN

Rapid sample processing and interpretation of estimated exposures will be critical for triaging exposed individuals after a major radiation incident. The dicentric chromosome (DC) assay assesses absorbed radiation using metaphase cells from blood. The Automated Dicentric Chromosome Identifier and Dose Estimator System (ADCI) identifies DCs and determines radiation doses. This study aimed to broaden accessibility and speed of this system, while protecting data and software integrity. ADCI Online is a secure web-streaming platform accessible worldwide from local servers. Cloud-based systems containing data and software are separated until they are linked for radiation exposure estimation. Dose estimates are identical to ADCI on dedicated computer hardware. Image processing and selection, calibration curve generation, and dose estimation of 9 test samples completed in < 2 days. ADCI Online has the capacity to alleviate analytic bottlenecks in intermediate-to-large radiation incidents. Multiple cloned software instances configured on different cloud environments accelerated dose estimation to within clinically relevant time frames.


Asunto(s)
Nube Computacional , Exposición a la Radiación , Humanos , Programas Informáticos , Bioensayo
8.
Int J Radiat Biol ; 98(5): 924-941, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34699300

RESUMEN

PURPOSE: Combinations of expressed genes can discriminate radiation-exposed from normal control blood samples by machine learning (ML) based signatures (with 8-20% misclassification rates). These signatures can quantify therapeutically relevant as well as accidental radiation exposures. The prodromal symptoms of acute radiation syndrome (ARS) overlap those present in influenza and dengue fever infections. Surprisingly, these human radiation signatures misclassified gene expression profiles of virally infected samples as false positive exposures. The present study investigates these and other confounders, and then mitigates their impact on signature accuracy. METHODS: This study investigated recall by previous and novel radiation signatures independently derived from multiple Gene Expression Omnibus datasets on common and rare non-neoplastic blood disorders and blood-borne infections (thromboembolism, S. aureus bacteremia, malaria, sickle cell disease, polycythemia vera, and aplastic anemia). Normalized expression levels of signature genes are used as input to ML-based classifiers to predict radiation exposure in other hematological conditions. RESULTS: Except for aplastic anemia, these blood-borne disorders modify the normal baseline expression values of genes present in radiation signatures, leading to false-positive misclassification of radiation exposures in 8-54% of individuals. Shared changes, predominantly in DNA damage response and apoptosis-related gene transcripts in radiation and confounding hematological conditions, compromise the utility of these signatures for radiation assessment. These confounding conditions (sickle cell disease, thrombosis, S. aureus bacteremia, malaria) induce neutrophil extracellular traps, initiated by chromatin decondensation, DNA damage response and fragmentation followed by programmed cell death or extrusion of DNA fragments. Riboviral infections (e.g. influenza or dengue fever) have been proposed to bind and deplete host RNA binding proteins, inducing R-loops in chromatin. R-loops that collide with incoming replication forks can result in incompletely repaired DNA damage, inducing apoptosis and releasing mature virus. To mitigate the effects of confounders, we evaluated predicted radiation-positive samples with novel gene expression signatures derived from radiation-responsive transcripts encoding secreted blood plasma proteins whose expression levels are unperturbed by these conditions. CONCLUSIONS: This approach identifies and eliminates misclassified samples with underlying hematological or infectious conditions, leaving only samples with true radiation exposures. Diagnostic accuracy is significantly improved by selecting genes that maximize both sensitivity and specificity in the appropriate tissue using combinations of the best signatures for each of these classes of signatures.


Asunto(s)
Anemia Aplásica , Anemia de Células Falciformes , Bacteriemia , Dengue , Gripe Humana , Cromatina , Dengue/genética , Perfilación de la Expresión Génica , Humanos , Staphylococcus aureus
9.
Hum Mutat ; 32(7): 735-42, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21523855

RESUMEN

Variants of uncertain significance (VUS) in the BRCA1 and BRCA2 genes potentially affecting coding sequence as well as normal splicing activity have confounded predisposition testing in breast cancer. Here, we apply information theory to analyze BRCA1/2 mRNA splicing mutations categorized as VUS. The method was validated for 31 of 36 mutations known to cause missplicing in BRCA1/2 and all 26 that do not alter splicing. All single-nucleotide variants in the Breast Cancer Information Resource (BIC; Breast Cancer Information Core Database; http://research.nhgri.nih.gov/bic; last access June 1, 2010) were then analyzed. Information analysis is similar in sensitivity to other predictive methods; however, the thermodynamic basis of the theory also enables splice-site affinity to be determined accurately, which is important for assessing mutations that render natural splice sites partially functional and competition between cryptic and natural splice sites. We report 299 of 2,071 single-nucleotide BIC mutations that are predicted to significantly weaken natural sites and/or strengthen cryptic splice sites, 171 of which are not designated as splicing mutations in the database. Splicing alterations are predicted for 68 of 690 BRCA1 and 60 of 958 BRCA2 mutations designated as VUS. These analyses should be useful in prioritizing suspected mutations for downstream expression studies and for predicting aberrantly spliced isoforms generated by these mutations.


Asunto(s)
Empalme Alternativo/genética , Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias de la Mama/genética , ARN Mensajero/genética , Secuencia de Aminoácidos , Biología Computacional , Bases de Datos Genéticas , Femenino , Variación Genética , Humanos , Teoría de la Información , Modelos Genéticos , Datos de Secuencia Molecular , Mutación
10.
F1000Res ; 10: 1312, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35646330

RESUMEN

Introduction: This study aimed to produce community-level geo-spatial mapping of confirmed COVID-19 cases in Ontario Canada in near real-time to support decision-making. This was accomplished by area-to-area geostatistical analysis, space-time integration, and spatial interpolation of COVID-19 positive individuals. Methods: COVID-19 cases and locations were curated for geostatistical analyses from March 2020 through June 2021, corresponding to the first, second, and third waves of infections. Daily cases were aggregated according to designated forward sortation area (FSA), and postal codes (PC) in municipal regions Hamilton, Kitchener/Waterloo, London, Ottawa, Toronto, and Windsor/Essex county. Hotspots were identified with area-to-area tests including Getis-Ord Gi*, Global Moran's I spatial autocorrelation, and Local Moran's I asymmetric clustering and outlier analyses. Case counts were also interpolated across geographic regions by Empirical Bayesian Kriging, which localizes high concentrations of COVID-19 positive tests, independent of FSA or PC boundaries. The Geostatistical Disease Epidemiology Toolbox, which is freely-available software, automates the identification of these regions and produces digital maps for public health professionals to assist in pandemic management of contact tracing and distribution of other resources.  Results: This study provided indicators in real-time of likely, community-level disease transmission through innovative geospatial analyses of COVID-19 incidence data. Municipal and provincial results were validated by comparisons with known outbreaks at long-term care and other high density residences and on farms. PC-level analyses revealed hotspots at higher geospatial resolution than public reports of FSAs, and often sooner. Results of different tests and kriging were compared to determine consistency among hotspot assignments. Concurrent or consecutive hotspots in close proximity suggested potential community transmission of COVID-19 from cluster and outlier analysis of neighboring PCs and by kriging. Results were also stratified by population based-categories (sex, age, and presence/absence of comorbidities). Conclusions: Earlier recognition of hotspots could reduce public health burdens of COVID-19 and expedite contact tracing.


Asunto(s)
COVID-19 , Teorema de Bayes , COVID-19/epidemiología , Análisis por Conglomerados , Humanos , Incidencia , Ontario/epidemiología
11.
Biochem J ; 418(2): 391-401, 2009 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-18973475

RESUMEN

hYVH1 [human orthologue of YVH1 (yeast VH1-related phosphatase)] is an atypical dual-specificity phosphatase that is widely conserved throughout evolution. Deletion studies in yeast have suggested a role for this phosphatase in regulating cell growth. However, the role of the human orthologue is unknown. The present study used MS to identify Hsp70 (heat-shock protein 70) as a novel hYVH1-binding partner. The interaction was confirmed using endogenous co-immunoprecipitation experiments and direct binding of purified proteins. Endogenous Hsp70 and hYVH1 proteins were also found to co-localize specifically to the perinuclear region in response to heat stress. Domain deletion studies revealed that the ATPase effector domain of Hsp70 and the zinc-binding domain of hYVH1 are required for the interaction, indicating that this association is not simply a chaperone-substrate complex. Thermal phosphatase assays revealed hYVH1 activity to be unaffected by heat and only marginally affected by non-reducing conditions, in contrast with the archetypical dual-specificity phosphatase VHR (VH1-related protein). In addition, Hsp70 is capable of increasing the phosphatase activity of hYVH1 towards an exogenous substrate under non-reducing conditions. Furthermore, the expression of hYVH1 repressed cell death induced by heat shock, H2O2 and Fas receptor activation but not cisplatin. Co-expression of hYVH1 with Hsp70 further enhanced cell survival. Meanwhile, expression of a catalytically inactive hYVH1 or a hYVH1 variant that is unable to interact with Hsp70 failed to protect cells from the various stress conditions. The results suggest that hYVH1 is a novel cell survival phosphatase that co-operates with Hsp70 to positively affect cell viability in response to cellular insults.


Asunto(s)
Fosfatasa 1 de Especificidad Dual/metabolismo , Fosfatasa 1 de Especificidad Dual/fisiología , Proteínas HSP70 de Choque Térmico/metabolismo , Respuesta al Choque Térmico , Secuencia de Aminoácidos , Muerte Celular/genética , Muerte Celular/fisiología , Supervivencia Celular/genética , Células Cultivadas , Fosfatasa 1 de Especificidad Dual/química , Fosfatasa 1 de Especificidad Dual/genética , Fosfatasas de Especificidad Dual/química , Fosfatasas de Especificidad Dual/genética , Fosfatasas de Especificidad Dual/metabolismo , Fosfatasas de Especificidad Dual/fisiología , Células HeLa , Respuesta al Choque Térmico/fisiología , Humanos , Chaperonas Moleculares/metabolismo , Chaperonas Moleculares/fisiología , Unión Proteica/fisiología , Dominios y Motivos de Interacción de Proteínas , Transfección
12.
F1000Res ; 9: 943, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33299552

RESUMEN

Background: Certain riboviruses can cause severe pulmonary complications leading to death in some infected patients. We propose that DNA damage induced-apoptosis accelerates viral release, triggered by depletion of host RNA binding proteins (RBPs) from nuclear RNA bound to replicating viral sequences. Methods: Information theory-based analysis of interactions between RBPs and individual sequences in the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), Influenza A (H3N2), HIV-1, and Dengue genomes identifies strong RBP binding sites in these viral genomes. Replication and expression of viral sequences is expected to increasingly sequester RBPs - SRSF1 and RNPS1. Ordinarily, RBPs bound to nascent host transcripts prevents their annealing to complementary DNA. Their depletion induces destabilizing R-loops. Chromosomal breakage occurs when an excess of unresolved R-loops collide with incoming replication forks, overwhelming the DNA repair machinery. We estimated stoichiometry of inhibition of RBPs in host nuclear RNA by counting competing binding sites in replicating viral genomes and host RNA. Results: Host RBP binding sites are frequent and conserved among different strains of RNA viral genomes. Similar binding motifs of SRSF1 and RNPS1 explain why DNA damage resulting from SRSF1 depletion is complemented by expression of RNPS1. Clustering of strong RBP binding sites coincides with the distribution of RNA-DNA hybridization sites across the genome. SARS-CoV-2 replication is estimated to require 32.5-41.8 hours to effectively compete for binding of an equal proportion of SRSF1 binding sites in host encoded nuclear RNAs. Significant changes in expression of transcripts encoding DNA repair and apoptotic proteins were found in an analysis of influenza A and Dengue-infected cells in some individuals. Conclusions: R-loop-induced apoptosis indirectly resulting from viral replication could release significant quantities of membrane-associated virions into neighboring alveoli. These could infect adjacent pneumocytes and other tissues, rapidly compromising lung function, causing multiorgan system failure and other described symptoms.


Asunto(s)
Enfermedades Pulmonares/virología , Estructuras R-Loop , Proteínas de Unión al ARN/metabolismo , ARN , Apoptosis , COVID-19 , Virus del Dengue , VIH-1 , Humanos , Subtipo H3N2 del Virus de la Influenza A , Pulmón , Enfermedades Pulmonares/patología , Ribonucleoproteínas , SARS-CoV-2 , Factores de Empalme Serina-Arginina , Replicación Viral
13.
Front Genet ; 11: 109, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32211018

RESUMEN

Splice isoform structure and abundance can be affected by either noncoding or masquerading coding variants that alter the structure or abundance of transcripts. When these variants are common in the population, these nonconstitutive transcripts are sufficiently frequent so as to resemble naturally occurring, alternative mRNA splicing. Prediction of the effects of such variants has been shown to be accurate using information theory-based methods. Single nucleotide polymorphisms (SNPs) predicted to significantly alter natural and/or cryptic splice site strength were shown to affect gene expression. Splicing changes for known SNP genotypes were confirmed in HapMap lymphoblastoid cell lines with gene expression microarrays and custom designed q-RT-PCR or TaqMan assays. The majority of these SNPs (15 of 22) as well as an independent set of 24 variants were then subjected to RNAseq analysis using the ValidSpliceMut web beacon (http://validsplicemut.cytognomix.com), which is based on data from the Cancer Genome Atlas and International Cancer Genome Consortium. SNPs from different genes analyzed with gene expression microarray and q-RT-PCR exhibited significant changes in affected splice site use. Thirteen SNPs directly affected exon inclusion and 10 altered cryptic site use. Homozygous SNP genotypes resulting in stronger splice sites exhibited higher levels of processed mRNA than alleles associated with weaker sites. Four SNPs exhibited variable expression among individuals with the same genotypes, masking statistically significant expression differences between alleles. Genome-wide information theory and expression analyses (RNAseq) in tumor exomes and genomes confirmed splicing effects for 7 of the HapMap SNP and 14 SNPs identified from tumor genomes. q-RT-PCR resolved rare splice isoforms with read abundance too low for statistical significance in ValidSpliceMut. Nevertheless, the web-beacon provides evidence of unanticipated splicing outcomes, for example, intron retention due to compromised recognition of constitutive splice sites. Thus, ValidSpliceMut and q-RT-PCR represent complementary resources for identification of allele-specific, alternative splicing.

14.
MedComm (2020) ; 1(3): 311-327, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34766125

RESUMEN

Cancer chemotherapy responses have been related to multiple pharmacogenetic biomarkers, often for the same drug. This study utilizes machine learning to derive multi-gene expression signatures that predict individual patient responses to specific tyrosine kinase inhibitors, including erlotinib, gefitinib, sorafenib, sunitinib, lapatinib and imatinib. Support vector machine (SVM) learning was used to train mathematical models that distinguished sensitivity from resistance to these drugs using a novel systems biology-based approach. This began with expression of genes previously implicated in specific drug responses, then expanded to evaluate genes whose products were related through biochemical pathways and interactions. Optimal pathway-extended SVMs predicted responses in patients at accuracies of 70% (imatinib), 71% (lapatinib), 83% (sunitinib), 83% (erlotinib), 88% (sorafenib) and 91% (gefitinib). These best performing pathway-extended models demonstrated improved balance predicting both sensitive and resistant patient categories, with many of these genes having a known role in cancer aetiology. Ensemble machine learning-based averaging of multiple pathway-extended models derived for an individual drug increased accuracy to >70% for erlotinib, gefitinib, lapatinib and sorafenib. Through incorporation of novel cancer biomarkers, machine learning-based pathway-extended signatures display strong efficacy predicting both sensitive and resistant patient responses to chemotherapy.

15.
PLoS One ; 15(4): e0232008, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32330192

RESUMEN

BACKGROUND: Accurate radiation dose estimates are critical for determining eligibility for therapies by timely triaging of exposed individuals after large-scale radiation events. However, the universal assessment of a large population subjected to a nuclear spill incident or detonation is not feasible. Even with high-throughput dosimetry analysis, test volumes far exceed the capacities of first responders to measure radiation exposures directly, or to acquire and process samples for follow-on biodosimetry testing. AIM: To significantly reduce data acquisition and processing requirements for triaging of treatment-eligible exposures in population-scale radiation incidents. METHODS: Physical radiation plumes modelled nuclear detonation scenarios of simulated exposures at 22 US locations. Models assumed only location of the epicenter and historical, prevailing wind directions/speeds. The spatial boundaries of graduated radiation exposures were determined by targeted, multistep geostatistical analysis of small population samples. Initially, locations proximate to these sites were randomly sampled (generally 0.1% of population). Empirical Bayesian kriging established radiation dose contour levels circumscribing these sites. Densification of each plume identified critical locations for additional sampling. After repeated kriging and densification, overlapping grids between each pair of contours of successive plumes were compared based on their diagonal Bray-Curtis distances and root-mean-square deviations, which provided criteria (<10% difference) to discontinue sampling. RESULTS/CONCLUSIONS: We modeled 30 scenarios, including 22 urban/high-density and 2 rural/low-density scenarios under various weather conditions. Multiple (3-10) rounds of sampling and kriging were required for the dosimetry maps to converge, requiring between 58 and 347 samples for different scenarios. On average, 70±10% of locations where populations are expected to receive an exposure ≥2Gy were identified. Under sub-optimal sampling conditions, the number of iterations and samples were increased, and accuracy was reduced. Geostatistical mapping limits the number of required dose assessments, the time required, and radiation exposure to first responders. Geostatistical analysis will expedite triaging of acute radiation exposure in population-scale nuclear events.


Asunto(s)
Exposición a la Radiación/análisis , Radiometría/métodos , Teorema de Bayes , Humanos , Modelos Teóricos , Exposición Profesional/análisis , Dosis de Radiación , Análisis Espacial , Triaje , Tiempo (Meteorología)
16.
Artículo en Inglés | MEDLINE | ID: mdl-30652029

RESUMEN

The selection of effective genes that accurately predict chemotherapy responses might improve cancer outcomes. We compare optimized gene signatures for cisplatin, carboplatin, and oxaliplatin responses in the same cell lines and validate each signature using data from patients with cancer. Supervised support vector machine learning is used to derive gene sets whose expression is related to the cell line GI50 values by backwards feature selection with cross-validation. Specific genes and functional pathways distinguishing sensitive from resistant cell lines are identified by contrasting signatures obtained at extreme and median GI50 thresholds. Ensembles of gene signatures at different thresholds are combined to reduce the dependence on specific GI50 values for predicting drug responses. The most accurate gene signatures for each platin are: cisplatin: BARD1, BCL2, BCL2L1, CDKN2C, FAAP24, FEN1, MAP3K1, MAPK13, MAPK3, NFKB1, NFKB2, SLC22A5, SLC31A2, TLR4, and TWIST1; carboplatin: AKT1, EIF3K, ERCC1, GNGT1, GSR, MTHFR, NEDD4L, NLRP1, NRAS, RAF1, SGK1, TIGD1, TP53, VEGFB, and VEGFC; and oxaliplatin: BRAF, FCGR2A, IGF1, MSH2, NAGK, NFE2L2, NQO1, PANK3, SLC47A1, SLCO1B1, and UGT1A1. Data from The Cancer Genome Atlas (TCGA) patients with bladder, ovarian, and colorectal cancer were used to test the cisplatin, carboplatin, and oxaliplatin signatures, resulting in 71.0%, 60.2%, and 54.5% accuracies in predicting disease recurrence and 59%, 61%, and 72% accuracies in predicting remission, respectively. One cisplatin signature predicted 100% of recurrence in non-smoking patients with bladder cancer (57% disease-free; N = 19), and 79% recurrence in smokers (62% disease-free; N = 35). This approach should be adaptable to other studies of chemotherapy responses, regardless of the drug or cancer types.

17.
F1000Res ; 7: 1908, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31275557

RESUMEN

We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public Beacon Network, as well as through our website. The website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.


Asunto(s)
Neoplasias , Empalme Alternativo , Humanos , Mutación , Sitios de Empalme de ARN , Empalme del ARN
18.
F1000Res ; 7: 233, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29904591

RESUMEN

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2,  PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2,  CD8A,  TALDO1,  PCNA,  EIF4G2,  LCN2,  CDKN1A,  PRKCH,  ENO1,  and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

19.
BMC Med Genomics ; 9: 19, 2016 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-27067391

RESUMEN

BACKGROUND: Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT) and prioritizing patients with large intragenic deletions. METHODS: We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk, anonymized patients without identified mutations in BRCA1/2. Aside from protein coding and copy number changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure. RESULTS: 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. An intragenic 32.1 kb interval in BRCA2 that was likely hemizygous was detected in one patient. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels). CONCLUSIONS: We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.


Asunto(s)
Neoplasias de la Mama/genética , ADN Intergénico/genética , Predisposición Genética a la Enfermedad , Patrón de Herencia/genética , Mutación/genética , Neoplasias Ováricas/genética , Secuencia de Bases , Exones/genética , Femenino , Humanos , Teoría de la Información , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Polimorfismo de Nucleótido Simple/genética , Unión Proteica/genética , Isoformas de Proteínas/genética , Sitios de Empalme de ARN/genética , Alineación de Secuencia , Análisis de Secuencia de ADN , Eliminación de Secuencia/genética , Regiones no Traducidas/genética
20.
F1000Res ; 5: 2124, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28620450

RESUMEN

Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients; was also used to derive gene signatures of other HT  (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing genes  ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, and TUBB4B was 78.6% accurate in predicting survival of 84 patients treated with both HT and CT (median survival ≥ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches was also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of genes  BCL2L1, BBC3, FGF2, FN1, and  TWIST1 was 81.1% accurate in 53 CT patients. In addition, a random forest (RF) classifier using a gene signature ( ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2,SLCO1B3, TUBB1, TUBB4A, and TUBB4B) predicted >3-year survival with 85.5% accuracy in 420 HT patients. A similar RF gene signature showed 82.7% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA