Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
J Nutr Sci Vitaminol (Tokyo) ; 69(5): 347-356, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37940575

RESUMO

Human type 2 taste receptor (TAS2R) genes encode bitter-taste receptors that are activated by various bitter ligands. It has been said that TAS2R38 may detect bitter substances and then suppress their intake by controlling gustatory or digestive responses. The major haplotypes of TAS2R38 involve three non-synonymous, closely-linked single-nucleotide polymorphisms (SNPs), leading to three amino acid substitutions (A49P, V262A and I296V) and resulting in a PAV or AVI allele. The allele frequency of AVI/PAV was 0.42/0.58 in this study. The genotype frequency distributions of TAS2R38 were 18.32%, 46.95% and 33.95% for AVI/AVI, AVI/PAV and PAV/PAV, respectively, and were in Hardy-Weinberg equilibrium. Five haplotype combinations of minor alleles were identified: AVI/AAV, AVI/AVV, AAI/PAV, AVI/PVV, AVI/AAI, with corresponding frequencies of 0.49%, 0.10%, 0.10%, 0.05%, 0.05%, respectively, in 2,047 Japanese Tohoku Medical Megabank Organization (ToMMo) subjects (2KJPN). The 16 subjects with these minor alleles were excluded from the questionnaire analysis, which found no significant differences among the major TAS2R38 genotypes (AVI/AVI, AVI/PAV and PAV/PAV) in the intake frequency of cruciferous vegetables or in the frequency of drinking alcohol. This result differs from previous data using American and European subjects. This is the first study to analyze the relationship between TAS2R38 genotype and the eating and drinking habits of Japanese subjects. It was also shown that there were no relationships at all between the genetic polymorphism of TAS2R46 and the phenotypes such as clinical BMI, eating and drinking habits among the 3 genotypes of TAS2R46 (∗/∗, ∗/W, W/W) at position W250∗ (∗stop codon).


Assuntos
População do Leste Asiático , Receptores Acoplados a Proteínas G , Paladar , Humanos , Genótipo , Polimorfismo de Nucleotídeo Único , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Paladar/genética , Comportamento de Ingestão de Líquido , Dieta
2.
Chemistry ; 29(53): e202301133, 2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37404204

RESUMO

A microdroplet co-culture system is useful for the parallel assessment of numerous possible cell-cell interactions by generating isolated subcommunities from a pool of heterogeneous cells. However, the integration of single-cell sequencing into such analysis has been limited due to the lack of effective molecular identifiers for each in-droplet subcommunity. Herein, we present a strategy for generating in-droplet subcommunity identifiers using DNA-functionalized microparticles encapsulated within microdroplets. These microparticles serve as initial information carriers, where their combinations act as distinct identifiers for in-droplet subcommunity. Upon optical trigger, DNA barcoding molecules encoding the microparticle information are once released in the microdroplets and then tag cell membranes. The tagged DNA molecules then serve as a second information carrier readable by single-cell sequencing to reconstitute the community in silico in the single-cell RNA sequencing data space.


Assuntos
Código de Barras de DNA Taxonômico , DNA
3.
Biol Methods Protoc ; 6(1): bpab006, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33928190

RESUMO

Advances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research. Hyperbolic space can better represent a hierarchical structure compared to Euclidean space, and can therefore be useful for describing and analyzing a phylogenetic tree. In this study, we developed a novel metric that considers the characteristics of a phylogenetic tree for representation in hyperbolic space. We compared the performance of the proposed hyperbolic embeddings, general hyperbolic embeddings, and Euclidean embeddings, and confirmed that our method could be used to more precisely reconstruct evolutionary distance. We also demonstrate that our approach is useful for predicting the nearest-neighbor node in a partial phylogenetic tree with missing nodes. Furthermore, we proposed a novel approach based on our metric to integrate multiple trees for analyzing tree nodes or imputing missing distances. This study highlights the utility of adopting a geometric approach for further advancing the applications of phylogenetic methods.

4.
Clin Cancer Res ; 25(22): 6756-6763, 2019 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-31383733

RESUMO

PURPOSE: The epithelial-to-mesenchymal transition, the major process by which some cancer cells convert from an epithelial phenotype to a mesenchymal one, has been suggested to drive chemo-resistance and/or metastasis in patients with cancer. However, only a few studies have demonstrated the presence of CD45/CD326 doubly-positive cells (CD45/CD326 DPC) in cancer. We deployed a combination of cell surface markers to elucidate the phenotypic heterogeneity in non-small cell lung cancer (NSCLC) cells and identified a new subpopulation that is doubly-positive for epithelial and non-epithelial cell-surface markers in both NSCLC cells and patients' malignant pleural effusions. EXPERIMENTAL DESIGN: We procured a total of 39 patients' samples, solid fresh lung cancer tissues from 21 patients and malignant pleural effusion samples from 18 others, and used FACS and fluorescence microscopy to check their surface markers. We also examined the EGFR mutations in patients with known acquired EGFR mutations. RESULTS: Our data revealed that 0.4% to 17.9% of the solid tumor tissue cells and a higher percentage of malignant pleural effusion cells harbored CD45/CD326 DPC expressing both epithelial and nonepithelial surface markers. We selected 3 EGFR mutation patients and genetically confirmed that the newly identified cell population really originated from cancer cells. We also found that higher proportions of CD45/CD326 DPC are significantly associated with poor prognosis. CONCLUSIONS: In conclusion, varying percentages of CD45/CD326 DPC exist in both solid cancer tissue and malignant pleural effusion in patients with NSCLC. This CD45/CD326 doubly-positive subpopulation can be an important key to clinical management of patients with NSCLC.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/metabolismo , Carcinoma Pulmonar de Células não Pequenas/mortalidade , Molécula de Adesão da Célula Epitelial/metabolismo , Antígenos Comuns de Leucócito/metabolismo , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/mortalidade , Biomarcadores , Carcinoma Pulmonar de Células não Pequenas/patologia , Análise Mutacional de DNA , Transição Epitelial-Mesenquimal , Receptores ErbB/genética , Feminino , Humanos , Imunofenotipagem , Neoplasias Pulmonares/patologia , Masculino , Mutação , Prognóstico
5.
Hum Genome Var ; 6: 27, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31231536

RESUMO

In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100 bps to ~10,000 bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website.

6.
Hum Genome Var ; 6: 29, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31240105

RESUMO

HLA-VBSeq is an HLA calling tool developed to infer the most likely HLA types from high-throughput sequencing data. However, there is still room for improvement in specific genetic groups because of the diversity of HLA alleles in human populations. Here, we present HLA-VBSeq v2, a software application that makes use of a new Japanese HLA reference panel to enhance calling accuracy for Japanese HLA class-I genes. Our analysis showed significant improvements in calling accuracy in all HLA regions, with prediction accuracies achieving over 99.0, 97.8, and 99.8% in HLA-A, B and C, respectively.

7.
BMJ Open ; 9(2): e025939, 2019 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-30782942

RESUMO

PURPOSE: A prospective cohort study for pregnant women, the Maternity Log study, was designed to construct a time-course high-resolution reference catalogue of bioinformatic data in pregnancy and explore the associations between genomic and environmental factors and the onset of pregnancy complications, such as hypertensive disorders of pregnancy, gestational diabetes mellitus and preterm labour, using continuous lifestyle monitoring combined with multiomics data on the genome, transcriptome, proteome, metabolome and microbiome. PARTICIPANTS: Pregnant women were recruited at the timing of first routine antenatal visits at Tohoku University Hospital, Sendai, Japan, between September 2015 and November 2016. Of the eligible women who were invited, 65.4% agreed to participate, and a total of 302 women were enrolled. The inclusion criteria were age ≥20 years and the ability to access the internet using a smartphone in the Japanese language. FINDINGS TO DATE: Study participants uploaded daily general health information including quality of sleep, condition of bowel movements and the presence of nausea, pain and uterine contractions. Participants also collected physiological data, such as body weight, blood pressure, heart rate and body temperature, using multiple home healthcare devices. The mean upload rate for each lifelog item was ranging from 67.4% (fetal movement) to 85.3% (physical activity), and the total number of data points was over 6 million. Biospecimens, including maternal plasma, serum, urine, saliva, dental plaque and cord blood, were collected for multiomics analysis. FUTURE PLANS: Lifelog and multiomics data will be used to construct a time-course high-resolution reference catalogue of pregnancy. The reference catalogue will allow us to discover relationships among multidimensional phenotypes and novel risk markers in pregnancy for the future personalised early prediction of pregnancy complications.


Assuntos
Estilo de Vida , Metaboloma , Microbiota , Complicações na Gravidez/diagnóstico , Proteoma , Transcriptoma , Adulto , Biologia Computacional , Feminino , Humanos , Japão , Pessoa de Meia-Idade , Gravidez , Estudos Prospectivos , Adulto Jovem
8.
Pharmacogenomics J ; 19(2): 136-146, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-29352165

RESUMO

Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.


Assuntos
Variação Genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Antígenos de Histocompatibilidade Classe I/genética , Alelos , Genótipo , Teste de Histocompatibilidade , Humanos , Japão , Análise de Sequência de DNA
9.
J Biochem ; 165(2): 139-158, 2019 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-30452759

RESUMO

Personalized healthcare (PHC) based on an individual's genetic make-up is one of the most advanced, yet feasible, forms of medical care. The Tohoku Medical Megabank (TMM) Project aims to combine population genomics, medical genetics and prospective cohort studies to develop a critical infrastructure for the establishment of PHC. To date, a TMM CommCohort (adult general population) and a TMM BirThree Cohort (birth+three-generation families) have conducted recruitments and baseline surveys. Genome analyses as part of the TMM Project will aid in the development of a high-fidelity whole-genome Japanese reference panel, in designing custom single-nucleotide polymorphism (SNP) arrays specific to Japanese, and in estimation of the biological significance of genetic variations through linked investigations of the cohorts. Whole-genome sequencing from >3,500 unrelated Japanese and establishment of a Japanese reference genome sequence from long-read data have been done. We next aim to obtain genotype data for all TMM cohort participants (>150,000) using our custom SNP arrays. These data will help identify disease-associated genomic signatures in the Japanese population, while genomic data from TMM BirThree Cohort participants will be used to improve the reference genome panel. Follow-up of the cohort participants will allow us to test the genetic markers and, consequently, contribute to the realization of PHC.


Assuntos
Povo Asiático/genética , Genética Médica/tendências , Genoma Humano/genética , Genômica , Medicina de Precisão/tendências , Estudos de Coortes , Feminino , Humanos , Japão , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Padrões de Referência
10.
BMC Genomics ; 17(1): 991, 2016 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-27912743

RESUMO

BACKGROUND: In the estimation of repeat numbers in a short tandem repeat (STR) region from high-throughput sequencing data, two types of strategies are mainly taken: a strategy based on counting repeat patterns included in sequence reads spanning the region and a strategy based on estimating the difference between the actual insert size and the insert size inferred from paired-end reads. The quality of sequence alignment is crucial, especially in the former approaches although usual alignment methods have difficulty in STR regions due to insertions and deletions caused by the variations of repeat numbers. RESULTS: We proposed a new dynamic programming based realignment method named STR-realigner that considers repeat patterns in STR regions as prior knowledge. By allowing the size change of repeat patterns with low penalty in STR regions, accurate realignment is expected. For the performance evaluation, publicly available STR variant calling tools were applied to three types of aligned reads: synthetically generated sequencing reads aligned with BWA-MEM, those realigned with STR-realigner, those realigned with ReviSTER, and those realigned with GATK IndelRealigner. From the comparison of root mean squared errors between estimated and true STR region size, the results for the dataset realigned with STR-realigner are better than those for other cases. For real data analysis, we used a real sequencing dataset from Illumina HiSeq 2000 for a parent-offspring trio. RepeatSeq and lobSTR were applied to the sequence reads for these individuals aligned with BWA-MEM, those realigned with STR-realigner, ReviSTER, and GATK IndelRealigner. STR-realigner shows the best performance in terms of consistency of the size of estimated STR regions in Mendelian inheritance. Root mean squared error values were also calculated from the comparison of these estimated results with STR region sizes obtained from high coverage PacBio sequencing data, and the results from the realigned sequencing data with STR-realigner showed the least (the best) root mean squared error value. CONCLUSIONS: The effectiveness of the proposed realignment method for STR regions was verified from the comparison with an existing method on both simulation datasets and real whole genome sequencing dataset.


Assuntos
Repetições de Microssatélites , Alinhamento de Sequência/métodos , Software , Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
11.
BMC Genomics ; 17 Suppl 5: 494, 2016 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-27586631

RESUMO

BACKGROUND: Two types of approaches are mainly considered for the repeat number estimation in short tandem repeat (STR) regions from high-throughput sequencing data: approaches directly counting repeat patterns included in sequence reads spanning the region and approaches based on detecting the difference between the insert size inferred from aligned paired-end reads and the actual insert size. Although the accuracy of repeat numbers estimated with the former approaches is high, the size of target STR regions is limited to the length of sequence reads. On the other hand, the latter approaches can handle STR regions longer than the length of sequence reads. However, repeat numbers estimated with the latter approaches is less accurate than those with the former approaches. RESULTS: We proposed a new statistical model named coalescentSTR that estimates repeat numbers from paired-end read distances for multiple individuals simultaneously by connecting the read generative model for each individual with their genealogy. In the model, the genealogy is represented by handling coalescent trees as hidden variables, and the summation of the hidden variables is taken on coalescent trees sampled based on phased genotypes located around a target STR region with Markov chain Monte Carlo. In the sampled coalescent trees, repeat number information from insert size data is propagated, and more accurate estimation of repeat numbers is expected for STR regions longer than the length of sequence reads. For finding the repeat numbers maximizing the likelihood of the model on the estimation of repeat numbers, we proposed a state-of-the-art belief propagation algorithm on sampled coalescent trees. CONCLUSIONS: We verified the effectiveness of the proposed approach from the comparison with existing methods by using simulation datasets and real whole genome and whole exome data for HapMap individuals analyzed in the 1000 Genomes Project.


Assuntos
Repetições de Microssatélites , Algoritmos , Simulação por Computador , Genoma Humano , Humanos , Modelos Estatísticos , Análise de Sequência de DNA
12.
BMC Genomics ; 17(1): 745, 2016 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-27654840

RESUMO

BACKGROUND: Genome-wide association studies have revealed associations between single-nucleotide polymorphisms (SNPs) and phenotypes such as disease symptoms and drug tolerance. To address the small sample size for rare variants, association studies tend to group gene or pathway level variants and evaluate the effect on the set of variants. One of such strategies, known as the sequential kernel association test (SKAT), is a widely used collapsing method. However, the reported p-values from SKAT tend to be biased because the asymptotic property of the statistic is used to calculate the p-value. Although this bias can be corrected by applying permutation procedures for the test statistics, the computational cost of obtaining p-values with high resolution is prohibitive. RESULTS: To address this problem, we devise an adaptive SKAT procedure termed AP-SKAT that efficiently classifies significant SNP sets and ranks them according to the permuted p-values. Our procedure adaptively stops the permutation test when the significance level is outside some confidence interval of the estimated p-value for a binomial distribution. To evaluate the performance, we first compare the power and sample size calculation and the type I error rates estimate of SKAT, SKAT-O, and the proposed procedure using genotype data in the SKAT R package and from 1000 Genome Project. Through computational experiments using whole genome sequencing and SNP array data, we show that our proposed procedure is highly efficient and has comparable accuracy to the standard procedure. CONCLUSIONS: For several types of genetic data, the developed procedure could achieve competitive power and sample size under small and large sample size conditions with controlling considerable type I error rates, and estimate p-values of significant SNP sets that are consistent with those estimated by the standard permutation test within a realistic time. This demonstrates that the procedure is sufficiently powerful for recent whole genome sequencing and SNP array data with increasing numbers of phenotypes. Additionally, this procedure can be used in other association tests by employing alternative methods to calculate the statistics.

14.
BMC Genomics ; 17 Suppl 1: 2, 2016 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-26818838

RESUMO

BACKGROUND: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. RESULTS: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. CONCLUSIONS: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar .


Assuntos
Regulação da Expressão Gênica , Genoma Humano , RNA/metabolismo , Algoritmos , Alelos , Teorema de Bayes , Linhagem Celular Tumoral , Diploide , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas/genética , Proteínas/metabolismo , RNA/química , RNA/genética , Análise de Sequência de RNA
15.
Nat Commun ; 6: 8018, 2015 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-26292667

RESUMO

The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies.


Assuntos
Povo Asiático/genética , Variação Genética , Genoma Humano , Haplótipos , Humanos
16.
J Hum Genet ; 60(10): 581-7, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26108142

RESUMO

The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659,253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r(2)>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%

Assuntos
Genótipo , Técnicas de Genotipagem/métodos , Haplótipos , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Povo Asiático , Cromossomos Humanos Y/genética , DNA Mitocondrial/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Japão , Masculino
17.
BMC Genomics ; 16 Suppl 2: S7, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25708870

RESUMO

BACKGROUND: Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. RESULTS: We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimizes read alignments to HLA allele sequences and abundance of reads on HLA alleles by variational Bayesian inference. We show the effectiveness of the proposed method over other methods through the analysis of predicting HLA types for HLA class I (HLA-A, -B and -C) and class II (HLA-DQA1,-DQB1 and -DRB1) loci from the simulation data of various depth of coverage, and real sequencing data of human trio samples. CONCLUSIONS: HLA-VBSeq is an efficient and accurate HLA typing method using high-throughput sequencing data without the need of primer design for HLA loci. Moreover, it does not assume any prior knowledge about HLA allele frequencies, and hence HLA-VBSeq is broadly applicable to human samples obtained from a genetically diverse population.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Antígenos HLA/genética , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Teste de Histocompatibilidade/estatística & dados numéricos , Algoritmos , Alelos , Teorema de Bayes , Frequência do Gene , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Teste de Histocompatibilidade/métodos , Humanos , Internet , Polimorfismo Genético , Reprodutibilidade dos Testes
18.
BMC Bioinformatics ; 16 Suppl 1: S4, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25707811

RESUMO

BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.


Assuntos
Alelos , Biologia Computacional/métodos , Variações do Número de Cópias de DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala , Amilases/genética , Teorema de Bayes , Feminino , Genética Populacional , Haplótipos , Humanos , Masculino , Modelos Estatísticos , Linhagem , Fenótipo , Saliva/enzimologia , Utah
19.
BMC Genomics ; 15: 664, 2014 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-25103311

RESUMO

BACKGROUND: Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. RESULTS: We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. CONCLUSION: The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Estatística como Assunto/métodos , Interface Usuário-Computador , Humanos
20.
BMC Genomics ; 15 Suppl 10: S5, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25560536

RESUMO

BACKGROUND: High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. >250 bp). RESULTS: We propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods. CONCLUSIONS: TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.


Assuntos
Biologia Computacional/métodos , Isoformas de RNA/genética , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Algoritmos , Teorema de Bayes , Perfilação da Expressão Gênica , Variação Genética , Células HeLa , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...