Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Mol Neurobiol ; 60(2): 1083-1098, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36414910

RESUMEN

Schizophrenia presents clinical and biological differences between males and females. This study investigated transcriptional profiles in the dorsolateral prefrontal cortex (DLPFC) using postmortem data from the largest RNA-sequencing (RNA-seq) database on schizophrenic cases and controls. Data for 154 male and 113 female controls and 160 male and 93 female schizophrenic cases were obtained from the CommonMind Consortium. In the RNA-seq database, the principal component analysis showed that sex effects were small in schizophrenia. After we analyzed the impact of sex-specific differences on gene expression, the female group showed more significantly changed genes compared with the male group. Based on the gene ontology analysis, the female sex-specific genes that changed were overrepresented in the mitochondrion, ATP (phosphocreatine and adenosine triphosphate)-, and metal ion-binding relevant biological processes. An ingenuity pathway analysis revealed that the differentially expressed genes related to schizophrenia in the female group were involved in midbrain dopaminergic and γ-aminobutyric acid (GABA)-ergic neurons and microglia. We used methylated DNA-binding domain-sequencing analyses and microarray to investigate the DNA methylation that potentially impacts the sex differences in gene transcription using a maternal immune activation (MIA) murine model. Among the sex-specific positional genes related to schizophrenia in the PFC of female offspring from MIA, the changes in the methylation and transcriptional expression of loci ACSBG1 were validated in the females with schizophrenia in independent postmortem samples by real-time PCR and pyrosequencing. Our results reveal potential genetic risks in the DLPFC for the sex-dependent prevalence and symptomology of schizophrenia.


Asunto(s)
Esquizofrenia , Animales , Femenino , Humanos , Masculino , Ratones , Corteza Prefontal Dorsolateral , Corteza Prefrontal/metabolismo , Esquizofrenia/genética , Esquizofrenia/metabolismo , Caracteres Sexuales , Transcriptoma/genética
2.
Cell Genom ; 2(12): 100214, 2022 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-36778047

RESUMEN

We combined functional genomics and human genetics to investigate processes that affect type 1 diabetes (T1D) risk by mediating beta cell survival in response to proinflammatory cytokines. We mapped 38,931 cytokine-responsive candidate cis-regulatory elements (cCREs) in beta cells using ATAC-seq and snATAC-seq and linked them to target genes using co-accessibility and HiChIP. Using a genome-wide CRISPR screen in EndoC-ßH1 cells, we identified 867 genes affecting cytokine-induced survival, and genes promoting survival and up-regulated in cytokines were enriched at T1D risk loci. Using SNP-SELEX, we identified 2,229 variants in cytokine-responsive cCREs altering transcription factor (TF) binding, and variants altering binding of TFs regulating stress, inflammation, and apoptosis were enriched for T1D risk. At the 16p13 locus, a fine-mapped T1D variant altering TF binding in a cytokine-induced cCRE interacted with SOCS1, which promoted survival in cytokine exposure. Our findings reveal processes and genes acting in beta cells during inflammation that modulate T1D risk.

3.
Nature ; 591(7848): 147-151, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33505025

RESUMEN

Many sequence variants have been linked to complex human traits and diseases1, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor-DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.


Asunto(s)
Polimorfismo de Nucleótido Simple/genética , Técnica SELEX de Producción de Aptámeros , Máquina de Vectores de Soporte , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Enfermedad/genética , Genoma Humano/genética , Humanos , Ligandos , Unión Proteica
4.
PLoS Comput Biol ; 16(6): e1007933, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32559231

RESUMEN

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Heurística , Humanos , Mutación INDEL
5.
Elife ; 82019 11 20.
Artículo en Inglés | MEDLINE | ID: mdl-31746734

RESUMEN

The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of RNF5 expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes.


Asunto(s)
Fibrosis Quística/genética , Predisposición Genética a la Enfermedad , Complejo Mayor de Histocompatibilidad/genética , Sitios de Carácter Cuantitativo/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Alelos , Mapeo Cromosómico , Fibrosis Quística/patología , Femenino , Estudio de Asociación del Genoma Completo , Antígenos HLA/genética , Haplotipos , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple/genética , RNA-Seq , Adulto Joven
6.
Nat Commun ; 10(1): 2078, 2019 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-31064983

RESUMEN

Genetic variants affecting pancreatic islet enhancers are central to T2D risk, but the gene targets of islet enhancer activity are largely unknown. We generate a high-resolution map of islet chromatin loops using Hi-C assays in three islet samples and use loops to annotate target genes of islet enhancers defined using ATAC-seq and published ChIP-seq data. We identify candidate target genes for thousands of islet enhancers, and find that enhancer looping is correlated with islet-specific gene expression. We fine-map T2D risk variants affecting islet enhancers, and find that candidate target genes of these variants defined using chromatin looping and eQTL mapping are enriched in protein transport and secretion pathways. At IGF2BP2, a fine-mapped T2D variant reduces islet enhancer activity and IGF2BP2 expression, and conditional inactivation of IGF2BP2 in mouse islets impairs glucose-stimulated insulin secretion. Our findings provide a resource for studying islet enhancer function and identifying genes involved in T2D risk.


Asunto(s)
Cromatina/metabolismo , Diabetes Mellitus Tipo 2/genética , Redes Reguladoras de Genes/genética , Islotes Pancreáticos/metabolismo , Proteínas de Unión al ARN/genética , Adulto , Animales , Núcleo Celular/metabolismo , Ensamble y Desensamble de Cromatina/genética , Diabetes Mellitus Tipo 2/patología , Elementos de Facilitación Genéticos/genética , Femenino , Perfilación de la Expresión Génica , Predisposición Genética a la Enfermedad , Glucosa/metabolismo , Humanos , Insulina/metabolismo , Islotes Pancreáticos/citología , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Persona de Mediana Edad , Conformación Molecular , Sitios de Carácter Cuantitativo/genética , Proteínas de Unión al ARN/metabolismo
7.
Pharmacogenomics J ; 19(2): 136-146, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-29352165

RESUMEN

Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.


Asunto(s)
Variación Genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Antígenos de Histocompatibilidad Clase I/genética , Alelos , Genotipo , Prueba de Histocompatibilidad , Humanos , Japón , Análisis de Secuencia de ADN
8.
Genetics ; 207(4): 1301-1312, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-29074555

RESUMEN

Expression quantitative trait loci (eQTL) studies have typically used single-variant association analysis to identify genetic variants correlated with gene expression. However, this approach has several drawbacks: causal variants cannot be distinguished from nonfunctional variants in strong linkage disequilibrium, combined effects from multiple causal variants cannot be captured, and low-frequency (<5% MAF) eQTL variants are difficult to identify. While these issues possibly could be overcome by using sparse polygenic models, which associate multiple genetic variants with gene expression simultaneously, the predictive performance of these models for eQTL studies has not been evaluated. Here, we assessed the ability of three sparse polygenic models (Lasso, Elastic Net, and BSLMM) to identify causal variants, and compared their efficacy to single-variant association analysis and a fine-mapping model. Using simulated data, we determined that, while these methods performed similarly when there was one causal SNP present at a gene, BSLMM substantially outperformed single-variant association analysis for prioritizing causal eQTL variants when multiple causal eQTL variants were present (1.6- to 5.2-fold higher recall at 20% precision), and identified up to 2.3-fold more low frequency variants as the top eQTL SNP. Analysis of real RNA-seq and whole-genome sequencing data of 131 iPSC samples showed that the eQTL SNPs identified by BSLMM had a higher functional enrichment in DHS sites and were more often low-frequency than those identified with single-variant association analysis. Our study showed that BSLMM is a more effective approach than single-variant association analysis for prioritizing multiple causal eQTL variants at a single gene.


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo/genética , Expresión Génica/genética , Variación Genética , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple/genética
9.
Invest Ophthalmol Vis Sci ; 58(5): 2818-2831, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-28564705

RESUMEN

Purpose: This study evaluated specific relationships between pathogenic mechanisms and genetic polymorphisms in primary open-angle glaucoma (POAG). We analyzed the morphologies of trabeculectomy specimens obtained from patients with familial POAG. Methods: We used light microscopy and transmission electron microscopy to examine specimens obtained from 17 eyes of 14 patients with familial POAG. We also conducted exome analyses of two families and used targeted Sanger sequencing to analyze samples obtained from the remaining patients. Results: The POAG cases examined in this study were divided into two groups based on morphologic characteristics. Group A eyes (7 eyes from 5 patients) had an abnormally thick trabecular meshwork (TM), whereas group B eyes (10 eyes from 9 patients) had a TM of normal thickness. The characteristics of the outflow routes in group A eyes were remarkable and included apoptotic TM cells, abnormally thickened TM basement membranes, fused TM beams, and occluded Schlemm's canals. All group A patients harbored mutations (F369L, P370L, T377M, and T448P) in the myocilin (MYOC) gene that were not found in group B patients. Conclusions: Although age matching of morphologic changes in the outflow routes was impossible due to the small sample size, this study suggests that abnormal TM cells may cause sequential damage in abnormally thickened TM basement membranes, TM cell apoptosis, TM beam fusion, and the occlusion of Schlemm's canals. The four detected MYOC mutations appeared to be associated with morphologic changes in the TM and the underlying pathogenesis of a subtype of familial POAG.


Asunto(s)
Humor Acuoso/metabolismo , Proteínas del Citoesqueleto/genética , Proteínas del Ojo/genética , Glaucoma de Ángulo Abierto/genética , Glicoproteínas/genética , Limbo de la Córnea/patología , Mutación , Polimorfismo Genético , Malla Trabecular/patología , Adulto , Anciano , Anciano de 80 o más Años , Pueblo Asiatico/genética , Biomarcadores/metabolismo , Exoma/genética , Femenino , Glaucoma de Ángulo Abierto/patología , Glaucoma de Ángulo Abierto/cirugía , Humanos , Presión Intraocular , Japón , Limbo de la Córnea/metabolismo , Masculino , Persona de Mediana Edad , Linaje , Reacción en Cadena de la Polimerasa , Malla Trabecular/metabolismo , Trabeculectomía
10.
Cell Stem Cell ; 20(4): 533-546.e7, 2017 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-28388430

RESUMEN

In this study, we used whole-genome sequencing and gene expression profiling of 215 human induced pluripotent stem cell (iPSC) lines from different donors to identify genetic variants associated with RNA expression for 5,746 genes. We were able to predict causal variants for these expression quantitative trait loci (eQTLs) that disrupt transcription factor binding and validated a subset of them experimentally. We also identified copy-number variant (CNV) eQTLs, including some that appear to affect gene expression by altering the copy number of intergenic regulatory regions. In addition, we were able to identify effects on gene expression of rare genic CNVs and regulatory single-nucleotide variants and found that reactivation of gene expression on the X chromosome depends on gene chromosomal position. Our work highlights the value of iPSCs for genetic association analyses and provides a unique resource for investigating the genetic regulation of gene expression in pluripotent cells.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Variación Genética , Células Madre Pluripotentes Inducidas/metabolismo , Sitios de Unión/genética , Reprogramación Celular/genética , Cromosomas Humanos X/genética , Variaciones en el Número de Copia de ADN/genética , Heterogeneidad Genética , Humanos , Anotación de Secuencia Molecular , Sitios de Carácter Cuantitativo/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo
11.
BMC Bioinformatics ; 18(1): 207, 2017 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-28388874

RESUMEN

BACKGROUND: Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from "genomic interaction" studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed. RESULTS: This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the paired-genomic-loci (PGL) file standard for genomic-interactions data, and the accompanying analysis tool suite "pgltools": a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses. CONCLUSIONS: Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools , and a python module of the operations can be installed from PyPI via the PyGLtools module.


Asunto(s)
Cromatina/metabolismo , Genómica/métodos , Programas Informáticos , Cromatina/genética , Inmunoprecipitación de Cromatina , Sitios Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento
12.
Stem Cell Reports ; 8(4): 1086-1100, 2017 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-28410642

RESUMEN

Large-scale collections of induced pluripotent stem cells (iPSCs) could serve as powerful model systems for examining how genetic variation affects biology and disease. Here we describe the iPSCORE resource: a collection of systematically derived and characterized iPSC lines from 222 ethnically diverse individuals that allows for both familial and association-based genetic studies. iPSCORE lines are pluripotent with high genomic integrity (no or low numbers of somatic copy-number variants) as determined using high-throughput RNA-sequencing and genotyping arrays, respectively. Using iPSCs from a family of individuals, we show that iPSC-derived cardiomyocytes demonstrate gene expression patterns that cluster by genetic background, and can be used to examine variants associated with physiological and disease phenotypes. The iPSCORE collection contains representative individuals for risk and non-risk alleles for 95% of SNPs associated with human phenotypes through genome-wide association studies. Our study demonstrates the utility of iPSCORE for examining how genetic variants influence molecular and physiological traits in iPSCs and derived cell lines.


Asunto(s)
Arritmias Cardíacas/genética , Bases de Datos Factuales , Estudios de Asociación Genética , Variación Genética , Células Madre Pluripotentes Inducidas/metabolismo , Miocitos Cardíacos/metabolismo , Arritmias Cardíacas/etnología , Arritmias Cardíacas/metabolismo , Arritmias Cardíacas/fisiopatología , Diferenciación Celular , Línea Celular , Reprogramación Celular/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células Madre Pluripotentes Inducidas/citología , Familia de Multigenes , Miocitos Cardíacos/citología , Análisis de Secuencia por Matrices de Oligonucleótidos , Fenotipo , Polimorfismo de Nucleótido Simple , Grupos Raciales
14.
Physiol Genomics ; 48(12): 922-927, 2016 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-27764769

RESUMEN

While more than 250 genes are known to cause inherited retinal degenerations (IRD), nearly 40-50% of families have the genetic basis for their disease unknown. In this study we sought to identify the underlying cause of IRD in a family by whole genome sequence (WGS) analysis. Clinical characterization including standard ophthalmic examination, fundus photography, visual field testing, electroretinography, and review of medical and family history was performed. WGS was performed on affected and unaffected family members using Illumina HiSeq X10. Sequence reads were aligned to hg19 using BWA-MEM and variant calling was performed with Genome Analysis Toolkit. The called variants were annotated with SnpEff v4.11, PolyPhen v2.2.2, and CADD v1.3. Copy number variations were called using Genome STRiP (svtoolkit 2.00.1611) and SpeedSeq software. Variants were filtered to detect rare potentially deleterious variants segregating with disease. Candidate variants were validated by dideoxy sequencing. Clinical evaluation revealed typical adolescent-onset recessive retinitis pigmentosa (arRP) in affected members. WGS identified about 4 million variants in each individual. Two rare and potentially deleterious compound heterozygous variants p.Arg281Cys and p.Arg487* were identified in the gene ATP/GTP binding protein like 5 (AGBL5) as likely causal variants. No additional variants in IRD genes that segregated with disease were identified. Mutation analysis confirmed the segregation of these variants with the IRD in the pedigree. Homology models indicated destabilization of AGBL5 due to the p.Arg281Cys change. Our findings establish the involvement of mutations in AGBL5 in RP and validate the WGS variant filtering pipeline we designed.


Asunto(s)
Carboxipeptidasas/genética , Retinitis Pigmentosa/genética , Adolescente , Análisis Mutacional de ADN , Electrorretinografía/métodos , Femenino , Estudios de Asociación Genética/métodos , Humanos , Masculino , Mutación/genética , Linaje , Degeneración Retiniana/genética , Secuenciación Completa del Genoma/métodos , Adulto Joven
15.
BMC Genomics ; 17 Suppl 5: 494, 2016 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-27586631

RESUMEN

BACKGROUND: Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads. However, repeat numbers estimated with the latter approaches is less accurate than those with the former approaches. RESULTS: We proposed a new statistical model named coalescentSTR that estimates repeat numbers from paired-end read distances for multiple individuals simultaneously by connecting the read generative model for each individual with their genealogy. In the model, the genealogy is represented by handling coalescent trees as hidden variables, and the summation of the hidden variables is taken on coalescent trees sampled based on phased genotypes located around a target STR region with Markov chain Monte Carlo. In the sampled coalescent trees, repeat number information from insert size data is propagated, and more accurate estimation of repeat numbers is expected for STR regions longer than the length of sequence reads. For finding the repeat numbers maximizing the likelihood of the model on the estimation of repeat numbers, we proposed a state-of-the-art belief propagation algorithm on sampled coalescent trees. CONCLUSIONS: We verified the effectiveness of the proposed approach from the comparison with existing methods by using simulation datasets and real whole genome and whole exome data for HapMap individuals analyzed in the 1000 Genomes Project.


Asunto(s)
Repeticiones de Microsatélite , Algoritmos , Simulación por Computador , Genoma Humano , Humanos , Modelos Estadísticos , Análisis de Secuencia de ADN
16.
BMC Genomics ; 17 Suppl 1: 2, 2016 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-26818838

RESUMEN

BACKGROUND: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. RESULTS: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. CONCLUSIONS: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar .


Asunto(s)
Regulación de la Expresión Génica , Genoma Humano , ARN/metabolismo , Algoritmos , Alelos , Teorema de Bayes , Línea Celular Tumoral , Diploidia , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas/genética , Proteínas/metabolismo , ARN/química , ARN/genética , Análisis de Secuencia de ARN
17.
Nat Commun ; 6: 8018, 2015 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-26292667

RESUMEN

The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies.


Asunto(s)
Pueblo Asiatico/genética , Variación Genética , Genoma Humano , Haplotipos , Humanos
18.
J Hum Genet ; 60(10): 581-7, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26108142

RESUMEN

The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659,253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r(2)>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%

Asunto(s)
Genotipo , Técnicas de Genotipaje/métodos , Haplotipos , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple , Pueblo Asiatico , Cromosomas Humanos Y/genética , ADN Mitocondrial/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Japón , Masculino
19.
BMC Genomics ; 16 Suppl 2: S7, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25708870

RESUMEN

BACKGROUND: Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. RESULTS: We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimizes read alignments to HLA allele sequences and abundance of reads on HLA alleles by variational Bayesian inference. We show the effectiveness of the proposed method over other methods through the analysis of predicting HLA types for HLA class I (HLA-A, -B and -C) and class II (HLA-DQA1,-DQB1 and -DRB1) loci from the simulation data of various depth of coverage, and real sequencing data of human trio samples. CONCLUSIONS: HLA-VBSeq is an efficient and accurate HLA typing method using high-throughput sequencing data without the need of primer design for HLA loci. Moreover, it does not assume any prior knowledge about HLA allele frequencies, and hence HLA-VBSeq is broadly applicable to human samples obtained from a genetically diverse population.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Antígenos HLA/genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Prueba de Histocompatibilidad/estadística & datos numéricos , Algoritmos , Alelos , Teorema de Bayes , Frecuencia de los Genes , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Prueba de Histocompatibilidad/métodos , Humanos , Internet , Polimorfismo Genético , Reproducibilidad de los Resultados
20.
BMC Bioinformatics ; 16 Suppl 1: S4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25707811

RESUMEN

BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.


Asunto(s)
Alelos , Biología Computacional/métodos , Variaciones en el Número de Copia de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Amilasas/genética , Teorema de Bayes , Femenino , Genética de Población , Haplotipos , Humanos , Masculino , Modelos Estadísticos , Linaje , Fenotipo , Saliva/enzimología , Utah
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA