Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 591(7848): 147-151, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33505025

RESUMEN

Many sequence variants have been linked to complex human traits and diseases1, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor-DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.


Asunto(s)
Polimorfismo de Nucleótido Simple/genética , Técnica SELEX de Producción de Aptámeros , Máquina de Vectores de Soporte , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Enfermedad/genética , Genoma Humano/genética , Humanos , Ligandos , Unión Proteica
2.
PLoS Comput Biol ; 16(6): e1007933, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32559231

RESUMEN

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Heurística , Humanos , Mutación INDEL
3.
Pharmacogenomics J ; 19(2): 136-146, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-29352165

RESUMEN

Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.


Asunto(s)
Variación Genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Antígenos de Histocompatibilidad Clase I/genética , Alelos , Genotipo , Prueba de Histocompatibilidad , Humanos , Japón , Análisis de Secuencia de ADN
4.
BMC Bioinformatics ; 18(1): 207, 2017 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-28388874

RESUMEN

BACKGROUND: Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from "genomic interaction" studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed. RESULTS: This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the paired-genomic-loci (PGL) file standard for genomic-interactions data, and the accompanying analysis tool suite "pgltools": a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses. CONCLUSIONS: Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools , and a python module of the operations can be installed from PyPI via the PyGLtools module.


Asunto(s)
Cromatina/metabolismo , Genómica/métodos , Programas Informáticos , Cromatina/genética , Inmunoprecipitación de Cromatina , Sitios Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento
5.
Physiol Genomics ; 48(12): 922-927, 2016 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-27764769

RESUMEN

While more than 250 genes are known to cause inherited retinal degenerations (IRD), nearly 40-50% of families have the genetic basis for their disease unknown. In this study we sought to identify the underlying cause of IRD in a family by whole genome sequence (WGS) analysis. Clinical characterization including standard ophthalmic examination, fundus photography, visual field testing, electroretinography, and review of medical and family history was performed. WGS was performed on affected and unaffected family members using Illumina HiSeq X10. Sequence reads were aligned to hg19 using BWA-MEM and variant calling was performed with Genome Analysis Toolkit. The called variants were annotated with SnpEff v4.11, PolyPhen v2.2.2, and CADD v1.3. Copy number variations were called using Genome STRiP (svtoolkit 2.00.1611) and SpeedSeq software. Variants were filtered to detect rare potentially deleterious variants segregating with disease. Candidate variants were validated by dideoxy sequencing. Clinical evaluation revealed typical adolescent-onset recessive retinitis pigmentosa (arRP) in affected members. WGS identified about 4 million variants in each individual. Two rare and potentially deleterious compound heterozygous variants p.Arg281Cys and p.Arg487* were identified in the gene ATP/GTP binding protein like 5 (AGBL5) as likely causal variants. No additional variants in IRD genes that segregated with disease were identified. Mutation analysis confirmed the segregation of these variants with the IRD in the pedigree. Homology models indicated destabilization of AGBL5 due to the p.Arg281Cys change. Our findings establish the involvement of mutations in AGBL5 in RP and validate the WGS variant filtering pipeline we designed.


Asunto(s)
Carboxipeptidasas/genética , Retinitis Pigmentosa/genética , Adolescente , Análisis Mutacional de ADN , Electrorretinografía/métodos , Femenino , Estudios de Asociación Genética/métodos , Humanos , Masculino , Mutación/genética , Linaje , Degeneración Retiniana/genética , Secuenciación Completa del Genoma/métodos , Adulto Joven
6.
BMC Genomics ; 17 Suppl 1: 2, 2016 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-26818838

RESUMEN

BACKGROUND: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. RESULTS: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. CONCLUSIONS: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar .


Asunto(s)
Regulación de la Expresión Génica , Genoma Humano , ARN/metabolismo , Algoritmos , Alelos , Teorema de Bayes , Línea Celular Tumoral , Diploidia , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas/genética , Proteínas/metabolismo , ARN/química , ARN/genética , Análisis de Secuencia de ARN
7.
BMC Genomics ; 17 Suppl 5: 494, 2016 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-27586631

RESUMEN

BACKGROUND: Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads. However, repeat numbers estimated with the latter approaches is less accurate than those with the former approaches. RESULTS: We proposed a new statistical model named coalescentSTR that estimates repeat numbers from paired-end read distances for multiple individuals simultaneously by connecting the read generative model for each individual with their genealogy. In the model, the genealogy is represented by handling coalescent trees as hidden variables, and the summation of the hidden variables is taken on coalescent trees sampled based on phased genotypes located around a target STR region with Markov chain Monte Carlo. In the sampled coalescent trees, repeat number information from insert size data is propagated, and more accurate estimation of repeat numbers is expected for STR regions longer than the length of sequence reads. For finding the repeat numbers maximizing the likelihood of the model on the estimation of repeat numbers, we proposed a state-of-the-art belief propagation algorithm on sampled coalescent trees. CONCLUSIONS: We verified the effectiveness of the proposed approach from the comparison with existing methods by using simulation datasets and real whole genome and whole exome data for HapMap individuals analyzed in the 1000 Genomes Project.


Asunto(s)
Repeticiones de Microsatélite , Algoritmos , Simulación por Computador , Genoma Humano , Humanos , Modelos Estadísticos , Análisis de Secuencia de ADN
8.
Proc Natl Acad Sci U S A ; 110(8): 3023-8, 2013 Feb 19.
Artículo en Inglés | MEDLINE | ID: mdl-23382209

RESUMEN

The derivation of induced pluripotent stem (iPS) cells from individuals of genetic disorders offers new opportunities for basic research into these diseases and the development of therapeutic compounds. Severe congenital neutropenia (SCN) is a serious disorder characterized by severe neutropenia at birth. SCN is associated with heterozygous mutations in the neutrophil elastase [elastase, neutrophil-expressed (ELANE)] gene, but the mechanisms that disrupt neutrophil development have not yet been clarified because of the current lack of an appropriate disease model. Here, we generated iPS cells from an individual with SCN (SCN-iPS cells). Granulopoiesis from SCN-iPS cells revealed neutrophil maturation arrest and little sensitivity to granulocyte-colony stimulating factor, reflecting a disease status of SCN. Molecular analysis of the granulopoiesis from the SCN-iPS cells vs. control iPS cells showed reduced expression of genes related to the wingless-type mmtv integration site family, member 3a (Wnt3a)/ß-catenin pathway [e.g., lymphoid enhancer-binding factor 1], whereas Wnt3a administration induced elevation lymphoid enhancer-binding factor 1-expression and the maturation of SCN-iPS cell-derived neutrophils. These results indicate that SCN-iPS cells provide a useful disease model for SCN, and the activation of the Wnt3a/ß-catenin pathway may offer a novel therapy for SCN with ELANE mutation.


Asunto(s)
Neutropenia/inmunología , Neutrófilos/inmunología , Células Madre Pluripotentes/inmunología , Proteína Wnt3A/fisiología , Diferenciación Celular/efectos de los fármacos , Relación Dosis-Respuesta a Droga , Factor Estimulante de Colonias de Granulocitos/farmacología , Humanos , Elastasa de Leucocito/genética , Mutación , Neutropenia/congénito , Neutropenia/patología , Neutrófilos/citología , Reacción en Cadena de la Polimerasa , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
9.
BMC Bioinformatics ; 16 Suppl 1: S4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25707811

RESUMEN

BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.


Asunto(s)
Alelos , Biología Computacional/métodos , Variaciones en el Número de Copia de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Amilasas/genética , Teorema de Bayes , Femenino , Genética de Población , Haplotipos , Humanos , Masculino , Modelos Estadísticos , Linaje , Fenotipo , Saliva/enzimología , Utah
10.
BMC Genomics ; 16 Suppl 2: S7, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25708870

RESUMEN

BACKGROUND: Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. RESULTS: We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimizes read alignments to HLA allele sequences and abundance of reads on HLA alleles by variational Bayesian inference. We show the effectiveness of the proposed method over other methods through the analysis of predicting HLA types for HLA class I (HLA-A, -B and -C) and class II (HLA-DQA1,-DQB1 and -DRB1) loci from the simulation data of various depth of coverage, and real sequencing data of human trio samples. CONCLUSIONS: HLA-VBSeq is an efficient and accurate HLA typing method using high-throughput sequencing data without the need of primer design for HLA loci. Moreover, it does not assume any prior knowledge about HLA allele frequencies, and hence HLA-VBSeq is broadly applicable to human samples obtained from a genetically diverse population.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Antígenos HLA/genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Prueba de Histocompatibilidad/estadística & datos numéricos , Algoritmos , Alelos , Teorema de Bayes , Frecuencia de los Genes , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Prueba de Histocompatibilidad/métodos , Humanos , Internet , Polimorfismo Genético , Reproducibilidad de los Resultados
11.
J Hum Genet ; 60(10): 581-7, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26108142

RESUMEN

The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659,253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r(2)>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%

Asunto(s)
Genotipo , Técnicas de Genotipaje/métodos , Haplotipos , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple , Pueblo Asiatico , Cromosomas Humanos Y/genética , ADN Mitocondrial/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Japón , Masculino
12.
Proc Natl Acad Sci U S A ; 109(44): E2998-3007, 2012 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-23045694

RESUMEN

Neural stem cells (NSCs) are considered to be the cell of origin of glioblastoma multiforme (GBM). However, the genetic alterations that transform NSCs into glioma-initiating cells remain elusive. Using a unique transposon mutagenesis strategy that mutagenizes NSCs in culture, followed by additional rounds of mutagenesis to generate tumors in vivo, we have identified genes and signaling pathways that can transform NSCs into glioma-initiating cells. Mobilization of Sleeping Beauty transposons in NSCs induced the immortalization of astroglial-like cells, which were then able to generate tumors with characteristics of the mesenchymal subtype of GBM on transplantation, consistent with a potential astroglial origin for mesenchymal GBM. Sequence analysis of transposon insertion sites from tumors and immortalized cells identified more than 200 frequently mutated genes, including human GBM-associated genes, such as Met and Nf1, and made it possible to discriminate between genes that function during astroglial immortalization vs. later stages of tumor development. We also functionally validated five GBM candidate genes using a previously undescribed high-throughput method. Finally, we show that even clonally related tumors derived from the same immortalized line have acquired distinct combinations of genetic alterations during tumor development, suggesting that tumor formation in this model system involves competition among genetically variant cells, which is similar to the Darwinian evolutionary processes now thought to generate many human cancers. This mutagenesis strategy is faster and simpler than conventional transposon screens and can potentially be applied to any tissue stem/progenitor cells that can be grown and differentiated in vitro.


Asunto(s)
Neoplasias Encefálicas/patología , Elementos Transponibles de ADN , Glioblastoma/patología , Mutagénesis , Células-Madre Neurales/citología , Animales , Transformación Celular Neoplásica , Humanos , Ratones
13.
BMC Genomics ; 15: 664, 2014 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-25103311

RESUMEN

BACKGROUND: Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. RESULTS: We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. CONCLUSION: The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.


Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos , Estadística como Asunto/métodos , Interfaz Usuario-Computador , Humanos
14.
BMC Genomics ; 15 Suppl 10: S5, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25560536

RESUMEN

BACKGROUND: High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. >250 bp). RESULTS: We propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods. CONCLUSIONS: TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.


Asunto(s)
Biología Computacional/métodos , Isoformas de ARN/genética , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Algoritmos , Teorema de Bayes , Perfilación de la Expresión Génica , Variación Genética , Células HeLa , Humanos , Programas Informáticos
15.
BMC Genomics ; 15: 673, 2014 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-25109789

RESUMEN

BACKGROUND: Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously. RESULTS: Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%. CONCLUSIONS: Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.


Asunto(s)
Exoma/genética , Genómica/instrumentación , Polimorfismo de Nucleótido Simple , Semiconductores , Análisis de Secuencia de ADN/instrumentación , Composición de Base , Femenino , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Reproducibilidad de los Resultados
16.
Bioinformatics ; 29(18): 2292-9, 2013 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-23821651

RESUMEN

MOTIVATION: Many human genes express multiple transcript isoforms through alternative splicing, which greatly increases diversity of protein function. Although RNA sequencing (RNA-Seq) technologies have been widely used in measuring amounts of transcribed mRNA, accurate estimation of transcript isoform abundances from RNA-Seq data is challenging because reads often map to more than one transcript isoforms or paralogs whose sequences are similar to each other. RESULTS: We propose a statistical method to estimate transcript isoform abundances from RNA-Seq data. Our method can handle gapped alignments of reads against reference sequences so that it allows insertion or deletion errors within reads. The proposed method optimizes the number of transcript isoforms by variational Bayesian inference through an iterative procedure, and its convergence is guaranteed under a stopping criterion. On simulated datasets, our method outperformed the comparable quantification methods in inferring transcript isoform abundances, and at the same time its rate of convergence was faster than that of the expectation maximization algorithm. We also applied our method to RNA-Seq data of human cell line samples, and showed that our prediction result was more consistent among technical replicates than those of other methods. AVAILABILITY: An implementation of our method is available at http://github.com/nariai/tigar CONTACT: nariai@megabank.tohoku.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Isoformas de ARN/análisis , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Empalme Alternativo , Teorema de Bayes , Línea Celular , Humanos , Isoformas de ARN/metabolismo
17.
Bioinformatics ; 29(22): 2835-43, 2013 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-24002111

RESUMEN

MOTIVATION: Variant calling from genome-wide sequencing data is essential for the analysis of disease-causing mutations and elucidation of disease mechanisms. However, variant calling in low coverage regions is difficult due to sequence read errors and mapping errors. Hence, variant calling approaches that are robust to low coverage data are demanded. RESULTS: We propose a new variant calling approach that considers pedigree information and haplotyping based on sequence reads spanning two or more heterozygous positions termed phase informative reads. In our approach, genotyping and haplotyping by the assignment of each read to a haplotype based on phase informative reads are simultaneously performed. Therefore, positions with low evidence for heterozygosity are rescued by phase informative reads, and such rescued positions contribute to haplotyping in a synergistic way. In addition, pedigree information supports more accurate haplotyping as well as genotyping, especially in low coverage regions. Although heterozygous positions are useful for haplotyping, homozygous positions are not informative and weaken the information from heterozygous positions, as majority of positions are homozygous. Thus, we introduce latent variables that determine zygosity at each position to filter out homozygous positions for haplotyping. In performance evaluation with a parent-offspring trio sequencing data, our approach outperforms existing approaches in accuracy on the agreement with single nucleotide polymorphism array genotyping results. Also, performance analysis considering distance between variants showed that the use of phase informative reads is effective for accurate variant calling, and further performance improvement is expected with longer sequencing data. CONTACT: kojima@megabank.tohoku.ac.jp .


Asunto(s)
Variación Genética , Técnicas de Genotipaje , Haplotipos , Modelos Estadísticos , Linaje , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple
18.
Ann Hematol ; 93(9): 1515-22, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24782121

RESUMEN

Heterozygous GATA-2 germline mutations are associated with overlapping clinical manifestations termed GATA-2 deficiency, characterized by immunodeficiency and predisposition to myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML). However, there is considerable clinical heterogeneity among patients, and the molecular basis for the evolution of immunodeficiency into MDS/AML remains unknown. Thus, we conducted whole-genome sequencing on a patient with a germline GATA-2 heterozygous mutation (c. 988 C > T; p. R330X), who had a history suggestive of immunodeficiency and evolved into MDS/AML. Analysis was conducted with DNA samples from leukocytes for immunodeficiency, bone marrow mononuclear cells for MDS and bone marrow-derived mesenchymal stem cells. Whereas we did not identify a candidate genomic deletion that may contribute to the evolution into MDS, a total of 280 MDS-specific nonsynonymous single nucleotide variants were identified. By narrowing down with the single nucleotide polymorphism database, the functional missense database, and NCBI information, we finally identified three candidate mutations for EZH2, HECW2 and GATA-1, which may contribute to the evolution of the disease.


Asunto(s)
Análisis Mutacional de ADN/métodos , Factor de Transcripción GATA2/genética , Leucemia Mieloide/genética , Síndromes Mielodisplásicos/genética , Enfermedad Aguda , Adulto , Células Cultivadas , Progresión de la Enfermedad , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células K562 , Masculino , Mutación
19.
Mol Neurobiol ; 60(2): 1083-1098, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36414910

RESUMEN

Schizophrenia presents clinical and biological differences between males and females. This study investigated transcriptional profiles in the dorsolateral prefrontal cortex (DLPFC) using postmortem data from the largest RNA-sequencing (RNA-seq) database on schizophrenic cases and controls. Data for 154 male and 113 female controls and 160 male and 93 female schizophrenic cases were obtained from the CommonMind Consortium. In the RNA-seq database, the principal component analysis showed that sex effects were small in schizophrenia. After we analyzed the impact of sex-specific differences on gene expression, the female group showed more significantly changed genes compared with the male group. Based on the gene ontology analysis, the female sex-specific genes that changed were overrepresented in the mitochondrion, ATP (phosphocreatine and adenosine triphosphate)-, and metal ion-binding relevant biological processes. An ingenuity pathway analysis revealed that the differentially expressed genes related to schizophrenia in the female group were involved in midbrain dopaminergic and γ-aminobutyric acid (GABA)-ergic neurons and microglia. We used methylated DNA-binding domain-sequencing analyses and microarray to investigate the DNA methylation that potentially impacts the sex differences in gene transcription using a maternal immune activation (MIA) murine model. Among the sex-specific positional genes related to schizophrenia in the PFC of female offspring from MIA, the changes in the methylation and transcriptional expression of loci ACSBG1 were validated in the females with schizophrenia in independent postmortem samples by real-time PCR and pyrosequencing. Our results reveal potential genetic risks in the DLPFC for the sex-dependent prevalence and symptomology of schizophrenia.


Asunto(s)
Esquizofrenia , Animales , Femenino , Humanos , Masculino , Ratones , Corteza Prefontal Dorsolateral , Corteza Prefrontal/metabolismo , Esquizofrenia/genética , Esquizofrenia/metabolismo , Caracteres Sexuales , Transcriptoma/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA