RESUMEN
Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.
Asunto(s)
Minería de Datos , Registros Electrónicos de Salud , Genómica , Medicina de Precisión , Nube Computacional , Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos Factuales , Variación Genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Informática Médica/métodos , Medicina de Precisión/métodosRESUMEN
IMPORTANCE: Large-scale DNA sequencing identifies incidental rare variants in established Mendelian disease genes, but the frequency of related clinical phenotypes in unselected patient populations is not well established. Phenotype data from electronic medical records (EMRs) may provide a resource to assess the clinical relevance of rare variants. OBJECTIVE: To determine the clinical phenotypes from EMRs for individuals with variants designated as pathogenic by expert review in arrhythmia susceptibility genes. DESIGN, SETTING, AND PARTICIPANTS: This prospective cohort study included 2022 individuals recruited for nonantiarrhythmic drug exposure phenotypes from October 5, 2012, to September 30, 2013, for the Electronic Medical Records and Genomics Network Pharmacogenomics project from 7 US academic medical centers. Variants in SCN5A and KCNH2, disease genes for long QT and Brugada syndromes, were assessed for potential pathogenicity by 3 laboratories with ion channel expertise and by comparison with the ClinVar database. Relevant phenotypes were determined from EMRs, with data available from 2002 (or earlier for some sites) through September 10, 2014. EXPOSURES: One or more variants designated as pathogenic in SCN5A or KCNH2. MAIN OUTCOMES AND MEASURES: Arrhythmia or electrocardiographic (ECG) phenotypes defined by International Classification of Diseases, Ninth Revision (ICD-9) codes, ECG data, and manual EMR review. RESULTS: Among 2022 study participants (median age, 61 years [interquartile range, 56-65 years]; 1118 [55%] female; 1491 [74%] white), a total of 122 rare (minor allele frequency <0.5%) nonsynonymous and splice-site variants in 2 arrhythmia susceptibility genes were identified in 223 individuals (11% of the study cohort). Forty-two variants in 63 participants were designated potentially pathogenic by at least 1 laboratory or ClinVar, with low concordance across laboratories (Cohen κ = 0.26). An ICD-9 code for arrhythmia was found in 11 of 63 (17%) variant carriers vs 264 of 1959 (13%) of those without variants (difference, +4%; 95% CI, -5% to +13%; P = .35). In the 1270 (63%) with ECGs, corrected QT intervals were not different in variant carriers vs those without (median, 429 vs 439 milliseconds; difference, -10 milliseconds; 95% CI, -16 to +3 milliseconds; P = .17). After manual review, 22 of 63 participants (35%) with designated variants had any ECG or arrhythmia phenotype, and only 2 had corrected QT interval longer than 500 milliseconds. CONCLUSIONS AND RELEVANCE: Among laboratories experienced in genetic testing for cardiac arrhythmia disorders, there was low concordance in designating SCN5A and KCNH2 variants as pathogenic. In an unselected population, the putatively pathogenic genetic variants were not associated with an abnormal phenotype. These findings raise questions about the implications of notifying patients of incidental genetic findings.
Asunto(s)
Arritmias Cardíacas/genética , Registros Electrónicos de Salud , Canales de Potasio Éter-A-Go-Go/genética , Variación Genética , Laboratorios/normas , Canal de Sodio Activado por Voltaje NAV1.5/genética , Fenotipo , Anciano , Anciano de 80 o más Años , Alelos , Arritmias Cardíacas/etnología , Arritmias Cardíacas/fisiopatología , Síndrome de Brugada/genética , Canal de Potasio ERG1 , Femenino , Predisposición Genética a la Enfermedad , Pruebas Genéticas/normas , Genómica , Heterocigoto , Humanos , Hallazgos Incidentales , Masculino , Persona de Mediana Edad , Mutación Missense , Estudios Prospectivos , Distribución Aleatoria , Estadísticas no Paramétricas , Adulto JovenRESUMEN
Fcγ receptors (FcγRs) are membrane-bound glycoproteins that bind to the fragment crystallizable (Fc) constant regions of IgG antibodies. Interactions between IgG immune complexes and FcγRs can initiate signal transduction that mediates important components of the immune response including activation of immune cells for clearance of opsonized pathogens or infected host cells. In humans, many studies have identified associations between FcγR gene polymorphisms and risk of infection, or progression of disease, suggesting a gene-level impact on FcγR-dependent immune responses. Rhesus macaques are an important translational model for most human health interventions, yet little is known about the breadth of rhesus macaque FcγR genetic diversity. This lack of knowledge prevents evaluation of the impact of FcγR polymorphisms on outcomes of preclinical studies performed in rhesus macaques. In this study we used long-read RNA sequencing to define the genetic diversity of FcγRs in 206 Indian-origin Rhesus macaques, Macaca mulatta. We describe the frequency of single nucleotide polymorphisms, insertions, deletions, frame-shift mutations, and isoforms. We also index the identified diversity using predicted and known rhesus macaque FcγR and Fc-FcγR structures. Future studies that define the functional significance of this genetic diversity will facilitate a better understanding of the correlation between human and macaque FcγR biology that is needed for effective translation of studies with antibody-mediated outcomes performed in rhesus macaques.
Asunto(s)
Complejo Antígeno-Anticuerpo , Receptores de IgG , Humanos , Animales , Macaca mulatta , Análisis de Secuencia de ARN , Mutación del Sistema de Lectura , Inmunoglobulina G , Glicoproteínas de MembranaRESUMEN
Rhesus macaques (RMs) are a common pre-clinical model used to test HIV vaccine efficacy and passive immunization strategies. Yet, it remains unclear to what extent the Fc-Fc receptor (FcR) interactions impacting antiviral activities of antibodies in RMs recapitulate those in humans. Here, we evaluated the FcR-related functionality of natural killer cells (NKs) from peripheral blood of uninfected humans and RMs to identify intra- and inter-species variation. NKs were screened for FcγRIIIa (human) and FcγRIII (RM) genotypes (FcγRIII(a)), receptor signaling, and antibody-dependent cellular cytotoxicity (ADCC), the latter mediated by a cocktail of monoclonal IgG1 antibodies with human or RM Fc. FcγRIII(a) genetic polymorphisms alone did not explain differences in NK effector functionality in either species cohort. Using the same parameters, hierarchical clustering separated each species into two clusters. Importantly, in principal components analyses, ADCC magnitude, NK contribution to ADCC, FcγRIII(a) cell-surface expression, and frequency of phosphorylated CD3ζ NK cells all contributed similarly to the first principal component within each species, demonstrating the importance of measuring multiple facets of NK cell function. Although ADCC potency was similar between species, we detected significant differences in frequencies of NK cells and pCD3ζ+ cells, level of cell-surface FcγRIII(a) expression, and NK-mediated ADCC (P<0.001), indicating that a combination of Fc-FcR parameters contribute to overall inter-species functional differences. These data strongly support the importance of multi-parameter analyses of Fc-FcR NK-mediated functions when evaluating efficacy of passive and active immunizations in pre- and clinical trials and identifying correlates of protection. The results also suggest that pre-screening animals for multiple FcR-mediated NK function would ensure even distribution of animals among treatment groups in future preclinical trials.
Asunto(s)
Anticuerpos Monoclonales , Receptores Fc , Animales , Humanos , Receptores Fc/metabolismo , Macaca mulatta , Células Asesinas Naturales , Análisis Multivariante , Análisis por ConglomeradosRESUMEN
Analyses of human clinical HIV-1 vaccine trials and preclinical vaccine studies performed in rhesus macaque (RM) models have identified associations between non-neutralizing Fc Receptor (FcR)-dependent antibody effector functions and reduced risk of infection. Specifically, antibody-dependent phagocytosis (ADP) has emerged as a common correlate of reduced infection risk in multiple RM studies and the human HVTN505 trial. This recurrent finding suggests that antibody responses with the capability to mediate ADP are most likely a desirable component of vaccine responses aimed at protecting against HIV-1 acquisition. As use of RM models is essential for development of the next generation of candidate HIV-1 vaccines, there is a need to determine how effectively ADP activity observed in RMs translates to activity in humans. In this study we compared ADP activity of human and RM monocytes and polymorphonuclear leukocytes (PMN) to bridge this gap in knowledge. We observed considerable variability in the magnitude of monocyte and PMN ADP activity across individual humans and RM that was not dependent on FcR alleles, and only modestly impacted by cell-surface levels of FcRs. Importantly, we found that for both human and RM phagocytes, ADP activity of antibodies targeting the CD4 binding site was greatest when mediated by human IgG3, followed by RM and human IgG1. These results demonstrate that there is functional homology between antibody and FcRs from these two species for ADP. We also used novel RM IgG1 monoclonal antibodies engineered with elongated hinge regions to show that hinge elongation augments RM ADP activity. The RM IgGs with engineered hinge regions can achieve ADP activity comparable to that observed with human IgG3. These novel modified antibodies will have utility in passive immunization studies aimed at defining the role of IgG3 and ADP in protection from virus challenge or control of disease in RM models. Our results contribute to a better translation of human and macaque antibody and FcR biology, and may help to improve testing accuracy and evaluations of future active and passive prevention strategies.
Asunto(s)
Citotoxicidad Celular Dependiente de Anticuerpos/inmunología , Fagocitos/inmunología , Fagocitosis/inmunología , Secuencia de Aminoácidos , Animales , Biomarcadores , Infecciones por VIH/inmunología , Infecciones por VIH/virología , Humanos , Inmunoglobulina G/inmunología , Isotipos de Inmunoglobulinas/química , Isotipos de Inmunoglobulinas/genética , Isotipos de Inmunoglobulinas/inmunología , Leucocitos Mononucleares/inmunología , Leucocitos Mononucleares/metabolismo , Macaca mulatta , Neutrófilos/inmunología , Neutrófilos/metabolismo , Fagocitos/metabolismo , Receptores de IgG/genética , Receptores de IgG/metabolismo , Síndrome de Inmunodeficiencia Adquirida del Simio/inmunología , Síndrome de Inmunodeficiencia Adquirida del Simio/virología , Especificidad de la EspecieRESUMEN
RATIONALE: Hypertrophic cardiomyopathy (HCM) is an inherited myocardial disease and a common cause of sudden cardiac death, heart failure, atrial fibrillation and stroke. In families affected by HCM, genotyping is useful for identifying susceptible relatives. In the present study, we investigated the disease-causing mutations in a three-generation Chinese family with HCM using whole exome sequencing (WES). PATIENT CONCERNS: The proband, a 50-year-old man, was diagnosed with HCM at the age of 41 years. He presented with an asymmetric hypertrophic interventricular septum and a maximum interventricular septum thickness of 18.04âmm. His third elder sister, niece and daughter were also clinically affected by HCM. DIAGNOSIS: Autosomal dominant HCM. INTERVENTIONS: Seven family members, including 4 affected members, accepted WES. The genetic variants were subsequently called using Genome Analysis Toolkit and annotated using the InterVar program. Following frequency filtration by the Genome Aggregation Database, the variants were evaluated using an in-house bioinformatics analysis pipeline. OUTCOMES: HCM was transmitted as an autosomal dominant trait in the family. An extremely rare stop gained mutation, rs796925245 (g.1:201359630G>A, c.835C>T, p.Gln279Ter) in the troponin T2 (TNNT2) gene was identified as the disease-causing mutation. The stop gained mutation was predicted to result in a truncated troponin T protein in cardiac sarcomere. An adolescent family member who had normal echocardiographic measurements was found to carry the same disease-causing mutation. LESSONS: A novel nonsense TNNT2 mutation was identified as the HCM-causing mutation in this Chinese pedigree. Since HCM shows a low penetrance by clinical criteria in adolescents, the adolescent mutation carrier, who is still clinically unaffected, should be offered routine follow-ups and sport activity recommendations to prevent adverse events including sudden cardiac death in the future.
Asunto(s)
Cardiomiopatía Hipertrófica Familiar/genética , Troponina T/genética , Adolescente , Adulto , Anciano , Pueblo Asiatico/genética , Cardiomiopatía Hipertrófica Familiar/complicaciones , Cardiomiopatía Hipertrófica Familiar/diagnóstico , Codón sin Sentido , Muerte Súbita Cardíaca/etiología , Ecocardiografía/métodos , Femenino , Humanos , Hipertrofia/diagnóstico por imagen , Masculino , Persona de Mediana Edad , Linaje , Fenotipo , Tabique Interventricular/patología , Secuenciación del Exoma/métodosRESUMEN
BACKGROUND: Clinical laboratories implement a variety of measures to classify somatic sequence variants and identify clinically significant variants to facilitate the implementation of precision medicine. To standardize the interpretation process, the Association for Molecular Pathology (AMP), American Society of Clinical Oncology (ASCO), and College of American Pathologists (CAP) published guidelines for the interpretation and reporting of sequence variants in cancer in 2017. These guidelines classify somatic variants using a four-tiered system with ten criteria. Even with the standardized guidelines, assessing clinical impacts of somatic variants remains to be tedious. Additionally, manual implementation of the guidelines may vary among professionals and may lack reproducibility when the supporting evidence is not documented in a consistent manner. RESULTS: We developed a semi-automated tool called "Variant Interpretation for Cancer" (VIC) to accelerate the interpretation process and minimize individual biases. VIC takes pre-annotated files and automatically classifies sequence variants based on several criteria, with the ability for users to integrate additional evidence to optimize the interpretation on clinical impacts. We evaluated VIC using several publicly available databases and compared with several predictive software programs. We found that VIC is time-efficient and conservative in classifying somatic variants under default settings, especially for variants with strong and/or potential clinical significance. Additionally, we also tested VIC on two cancer-panel sequencing datasets to show its effectiveness in facilitating manual interpretation of somatic variants. CONCLUSIONS: Although VIC cannot replace human reviewers, it will accelerate the interpretation process on somatic variants. VIC can also be customized by clinical laboratories to fit into their analytical pipelines to facilitate the laborious process of somatic variant interpretation. VIC is freely available at https://github.com/HGLab/VIC/ .
Asunto(s)
Biología Computacional , Predisposición Genética a la Enfermedad , Variación Genética , Neoplasias/genética , Programas Informáticos , Alelos , Biomarcadores de Tumor , Biología Computacional/métodos , Bases de Datos Genéticas , Frecuencia de los Genes , Pruebas Genéticas , Mutación de Línea Germinal , Humanos , Anotación de Secuencia Molecular , Neoplasias/diagnóstico , Medicina de PrecisiónRESUMEN
Asian Americans (AS) have significantly lower incidence and mortality rates of breast cancer than Caucasian Americans (CA). Although this racial disparity has been documented, the underlying pathogenetic factors explaining it are obscure. We addressed this issue by an integrative genomics approach to compare mRNA expression between AS and CA cases of breast cancer. RNA-seq data from the Cancer Genome Atlas showed that mRNA expression revealed significant differences at gene and pathway levels. Increased susceptibility and severity in CA patients were likely the result of synergistic environmental and genetic risk factors, with arachidonic acid metabolism and PPAR signaling pathways implicated in linking environmental and genetic factors. An analysis that also added eQTL data from the Genotype-Tissue Expression Project and SNP data from the 1,000 Genomes Project identified several SNPs associated with differentially expressed genes. Overall, the associations we identified may enable a more focused study of genotypic differences that may help explain the disparity in breast cancer incidence and mortality rates in CA and AS populations and inform precision medicine. Cancer Res; 77(2); 423-33. ©2016 AACR.
Asunto(s)
Neoplasias de la Mama/etnología , Neoplasias de la Mama/genética , Medicina de Precisión/métodos , ARN Mensajero/genética , Adulto , Anciano , Asiático/genética , Femenino , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , ARN Mensajero/análisis , Transcriptoma , Población Blanca/genéticaRESUMEN
Drug and xenobiotic metabolizing enzymes (DXME) play important roles in drug responses and carcinogenesis. Recent studies have found that expression of DXME in cancer cells significantly affects drug clearance and the onset of drug resistance. In this study we compared the expression of DXME in breast tumor tissue samples from patients representing three ethnic groups: Caucasian Americans (CA), African Americans (AA), and Asian Americans (AS). We further combined DXME gene expression data with eQTL data from the GTEx project and with allele frequency data from the 1000 Genomes project to identify SNPs that may be associated with differential expression of DXME genes. We identified substantial differences among CA, AA, and AS populations in the expression of DXME genes and in activation of pathways involved in drug metabolism, including those involved in metabolizing chemotherapy drugs that are commonly used in the treatment of breast cancer. These data suggest that differential expression of DXME may associate with health disparities in breast cancer outcomes observed among these three ethnic groups. Our study suggests that development of personalized treatment strategies for breast cancer patients could be improved by considering both germline genotypes and tumor specific mutations and expression profiles related to DXME genes.
Asunto(s)
Antineoplásicos/metabolismo , Neoplasias de la Mama/genética , Sistema Enzimático del Citocromo P-450/genética , Regulación Neoplásica de la Expresión Génica , Inactivación Metabólica/genética , Proteínas de Neoplasias/genética , Alelos , Antineoplásicos/uso terapéutico , Pueblo Asiatico , Población Negra , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/enzimología , Neoplasias de la Mama/etnología , Sistema Enzimático del Citocromo P-450/clasificación , Sistema Enzimático del Citocromo P-450/metabolismo , Bases de Datos Factuales , Femenino , Frecuencia de los Genes , Disparidades en Atención de Salud , Humanos , Proteínas de Neoplasias/clasificación , Proteínas de Neoplasias/metabolismo , Estadificación de Neoplasias , Medicina de Precisión , Resultado del Tratamiento , Población Blanca , Xenobióticos/metabolismo , Xenobióticos/uso terapéuticoRESUMEN
Advances in genomic medicine have the potential to change the way we treat human disease, but translating these advances into reality for improving healthcare outcomes depends essentially on our ability to discover disease- and/or drug-associated clinically actionable genetic mutations. Integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a big data infrastructure can provide an efficient and effective way to identify clinically actionable genetic variants for personalized treatments and reduce healthcare costs. We review bioinformatics processing of next-generation sequencing (NGS) data, bioinformatics infrastructures for implementing precision medicine, and bioinformatics approaches for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.
Asunto(s)
Registros Electrónicos de Salud , Variación Genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Medicina de Precisión , Biología Computacional , Humanos , Informática Médica , Terapia Molecular DirigidaRESUMEN
BACKGROUND: Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. RESULTS: In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. CONCLUSIONS: This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
RESUMEN
Structural analysis of microscopic objects is a longstanding topic in several scientific disciplines, such as biological, mechanical, and materials sciences. The scanning electron microscope (SEM), as a promising imaging equipment has been around for decades to determine the surface properties (e.g., compositions or geometries) of specimens by achieving increased magnification, contrast, and resolution greater than one nanometer. Whereas SEM micrographs still remain two-dimensional (2D), many research and educational questions truly require knowledge and facts about their three-dimensional (3D) structures. 3D surface reconstruction from SEM images leads to remarkable understanding of microscopic surfaces, allowing informative and qualitative visualization of the samples being investigated. In this contribution, we integrate several computational technologies including machine learning, contrario methodology, and epipolar geometry to design and develop a novel and efficient method called 3DSEM++ for multi-view 3D SEM surface reconstruction in an adaptive and intelligent fashion. The experiments which have been performed on real and synthetic data assert the approach is able to reach a significant precision to both SEM extrinsic calibration and its 3D surface modeling.
RESUMEN
Several important and fundamental aspects of disease genetics models have yet to be described. One such property is the relationship of disease association statistics at a marker site closely linked to a disease causing site. A complete description of this two-locus system is of particular importance to experimental efforts to fine map association signals for complex diseases. Here, we present a simple relationship between disease association statistics and the decline of linkage disequilibrium from a causal site. Specifically, the ratio of Chi-square disease association statistics at a marker site and causal site is equivalent to the standard measure of pairwise linkage disequilibrium, r2. A complete derivation of this relationship from a general disease model is shown. Quite interestingly, this relationship holds across all modes of inheritance. Extensive Monte Carlo simulations using a disease genetics model applied to chromosomes subjected to a standard model of recombination are employed to better understand the variation around this fine mapping theorem due to sampling effects. We also use this relationship to provide a framework for estimating properties of a non-interrogated causal site using data at closely linked markers. Lastly, we apply this way of examining association data from high-density genotyping in a large, publicly-available data set investigating extreme BMI. We anticipate that understanding the patterns of disease association decay with declining linkage disequilibrium from a causal site will enable more powerful fine mapping methods and provide new avenues for identifying causal sites/genes from fine-mapping studies.
RESUMEN
BACKGROUND: It is unclear whether and how whole-genome sequencing (WGS) data can be used to implement genomic medicine. Our objective is to retrospectively evaluate whether WGS can facilitate improving prevention and care for patients with susceptibility to cancer syndromes. METHODS AND FINDINGS: We analyzed genetic mutations in 60 autosomal dominant cancer-predisposition genes in 300 deceased patients with WGS data and nearly complete long-term (over 30 years) medical records. To infer biological insights from massive amounts of WGS data and comprehensive clinical data in a short period of time, we developed an in-house analysis pipeline within the SeqHBase software framework to quickly identify pathogenic or likely pathogenic variants. The clinical data of the patients who carried pathogenic and/or likely pathogenic variants were further reviewed to assess their clinical conditions using their lifetime EHRs. Among the 300 participants, 5 (1.7%) carried pathogenic or likely pathogenic variants in 5 cancer-predisposing genes: one in APC, BRCA1, BRCA2, NF1, and TP53 each. When assessing the clinical data, each of the 5 patients had one or more different types of cancers, fully consistent with their genetic profiles. Among these 5 patients, 2 died due to cancer while the others had multiple disorders later in their lifetimes; however, they may have benefited from early diagnosis and treatment for healthier lives, had the patients had genetic testing in their earlier lifetimes. CONCLUSIONS: We demonstrated a case study where the discovery of pathogenic or likely pathogenic germline mutations from population-wide WGS correlates with clinical outcome. The use of WGS may have clinical impacts to improve healthcare delivery.