Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
Add more filters

Country/Region as subject
Affiliation country
Publication year range
1.
Bioinformatics ; 31(10): 1577-83, 2015 May 15.
Article in English | MEDLINE | ID: mdl-25609790

ABSTRACT

MOTIVATION: Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to single-nucleotide polymorphism (SNP) analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed. RESULTS: The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly, SNPs were found from the BWT much faster than from the mapping results. It took only a few minutes to find SNPs from the BWT (with a supplementary data, fragment depth of coverage [FDC]) using a desktop workstation in the case of human exome or transcriptome sequencing data and 20 min using a dual-CPU server in the case of human genome sequencing data. The SNPs found with the proposed method almost agreed with those found by a time-consuming state-of-the-art tool, except for the cases in which the use of fragments of reads led to sensitivity loss or sequencing depth was not sufficient. These exceptions were predictable in advance on the basis of minimum length for uniqueness (MLU) and FDC defined on the reference genome. Moreover, BWT and FDC were computed in less time than it took to get the mapping results, provided that the data were large enough.


Subject(s)
Algorithms , Exome/genetics , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Computational Biology/methods , Humans
2.
BMC Bioinformatics ; 16 Suppl 18: S5, 2015.
Article in English | MEDLINE | ID: mdl-26678411

ABSTRACT

BACKGROUND: The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads") has not been fully studied. The BWT basically serves as a lossless dictionary of reads, unlike the heuristic and lossy reads-to-genome mapping results conventionally obtained in the first step of sequence analysis. Thus, it is naturally expected to lead to development of sensitive methods for analysis of short-read data. Recently, one of the most active areas of research in sequence analysis is sensitive detection of rare genomic rearrangements from whole-genome sequencing (WGS) data of heterogeneous cancer samples. The application the BWT of reads to the analysis of genomic rearrangements is addressed in this study. RESULTS: A new method for sensitive detection of genomic rearrangements by using the BWT of reads in the following three steps is proposed: first, breakpoint regions, which contain breakpoints and are joined together by rearrangement, are predicted from the distribution of so-called discordant pairs by using a kind of the conjugate gradient method; second, reads partially matching the breakpoint regions are collected from the BWT of reads; and third, breakpoints are detected as branching points among the collected reads, and their precise positions are determined. The method was experimentally implemented, and its performance (i.e., sensitivity and specificity) was evaluated by using simulated data with known artificial rearrangements. It was applied to publicly available real biological WGS data of cancer patients, and the detection results were compared with published results. CONCLUSIONS: Serving as a lossless dictionary of reads, the BWT of short reads enables sensitive analysis of genomic rearrangements in heterogeneous cancer-genome samples when used in conjunction with breakpoint-region predictions based on a conjugate gradient method.


Subject(s)
Algorithms , Genomics , Databases, Genetic , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Software
3.
Brain Behav Immun ; 49: 148-55, 2015 Oct.
Article in English | MEDLINE | ID: mdl-25986216

ABSTRACT

Etiology of narcolepsy-cataplexy involves multiple genetic and environmental factors. While the human leukocyte antigen (HLA)-DRB1*15:01-DQB1*06:02 haplotype is strongly associated with narcolepsy, it is not sufficient for disease development. To identify additional, non-HLA susceptibility genes, we conducted a genome-wide association study (GWAS) using Japanese samples. An initial sample set comprising 409 cases and 1562 controls was used for the GWAS of 525,196 single nucleotide polymorphisms (SNPs) located outside the HLA region. An independent sample set comprising 240 cases and 869 controls was then genotyped at 37 SNPs identified in the GWAS. We found that narcolepsy was associated with a SNP in the promoter region of chemokine (C-C motif) receptor 1 (CCR1) (rs3181077, P=1.6×10(-5), odds ratio [OR]=1.86). This rs3181077 association was replicated with the independent sample set (P=0.032, OR=1.36). We measured mRNA levels of candidate genes in peripheral blood samples of 38 cases and 37 controls. CCR1 and CCR3 mRNA levels were significantly lower in patients than in healthy controls, and CCR1 mRNA levels were associated with rs3181077 genotypes. In vitro chemotaxis assays were also performed to measure monocyte migration. We observed that monocytes from carriers of the rs3181077 risk allele had lower migration indices with a CCR1 ligand. CCR1 and CCR3 are newly discovered susceptibility genes for narcolepsy. These results highlight the potential role of CCR genes in narcolepsy and support the hypothesis that patients with narcolepsy have impaired immune function.


Subject(s)
Narcolepsy/genetics , Polymorphism, Single Nucleotide , Receptors, CCR1/genetics , Receptors, CCR3/genetics , Asian People , Genome-Wide Association Study , Humans , Japan
4.
J Hum Genet ; 59(5): 235-40, 2014 May.
Article in English | MEDLINE | ID: mdl-24694762

ABSTRACT

In humans, narcolepsy with cataplexy (narcolepsy) is a sleep disorder that is characterized by sleepiness, cataplexy and rapid eye movement (REM) sleep abnormalities. Narcolepsy is caused by a reduction in the number of neurons that produce hypocretin (orexin) neuropeptide. Both genetic and environmental factors contribute to the development of narcolepsy.Rare and large copy number variations (CNVs) reportedly play a role in the etiology of a number of neuropsychiatric disorders. Narcolepsy is considered a neurological disorder; therefore, we sought to investigate any possible association between rare and large CNVs and human narcolepsy. We used DNA microarray data and a CNV detection software application, PennCNV-Affy, to detect CNVs in 426 Japanese narcoleptic patients and 562 healthy individuals. Overall, we found a significant enrichment of rare and large CNVs (frequency ≤1%, size ≥100 kb) in the patients (case-control ratio of CNV count=1.54, P=5.00 × 10(-4)). Next, we extended a region-based association analysis by including CNVs with its size ≥30 kb. Rare and large CNVs in PARK2 region showed a significant association with narcolepsy. Four patients were assessed to carry duplications of the gene region, whereas no controls carried the duplication, which was further confirmed by quantitative PCR assay. This duplication was also found in 2 essential hypersomnia (EHS) patients out of 171 patients. Furthermore, a pathway analysis revealed enrichments of gene disruptions by rare and large CNVs in immune response, acetyltransferase activity, cell cycle regulation and regulation of cell development. This study constitutes the first report on the risk association between multiple rare and large CNVs and the pathogenesis of narcolepsy. In the future, replication studies are needed to confirm the associations.


Subject(s)
Asian People/genetics , DNA Copy Number Variations , Genome-Wide Association Study , Narcolepsy/genetics , Case-Control Studies , Gene Regulatory Networks , Humans , Japan , Narcolepsy/metabolism , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , Signal Transduction , Ubiquitin-Protein Ligases/genetics
5.
Hum Mol Genet ; 20(17): 3507-16, 2011 Sep 01.
Article in English | MEDLINE | ID: mdl-21659334

ABSTRACT

Hematologic abnormalities during current therapy with pegylated interferon and ribavirin (PEG-IFN/RBV) for chronic hepatitis C (CHC) often necessitate dose reduction and premature withdrawal from therapy. The aim of this study was to identify host factors associated with IFN-induced thrombocytopenia by genome-wide association study (GWAS). In the GWAS stage using 900K single-nucleotide polymorphism (SNP) microarrays, 303 Japanese CHC patients treated with PEG-IFN/RBV therapy were genotyped. One SNP (rs11697186) located on DDRGK1 gene on chromosome 20 showed strong associations in the minor-allele-dominant model with the decrease of platelet counts in response to PEG-IFN/RBV therapy [P = 8.17 × 10(-9); odds ratio (OR) = 4.6]. These associations were replicated in another sample set (n = 391) and the combined P-values reached 5.29 × 10(-17) (OR = 4.5). Fine mapping with 22 SNPs around DDRGK1 and ITPA genes showed that rs11697186 at the GWAS stage had a strong linkage disequilibrium with rs1127354, known as a functional variant in the ITPA gene. The ITPA-AA/CA genotype was independently associated with a higher degree of reduction in platelet counts at week 4 (P < 0.0001), as well as protection against the reduction in hemoglobin, whereas the CC genotype had significantly less reduction in the mean platelet counts compared with the AA/CA genotype (P < 0.0001 for weeks 2, 4, 8, 12), due to a reactive increase of the platelet count through weeks 1-4. Our present results may provide a valuable pharmacogenetic diagnostic tool for tailoring PEG-IFN/RBV dosing to minimize drug-induced adverse events.


Subject(s)
Antiviral Agents/therapeutic use , Genome-Wide Association Study/methods , Hepatitis C, Chronic/drug therapy , Interferons/therapeutic use , Pyrophosphatases/genetics , Ribavirin/therapeutic use , Thrombocytopenia/genetics , Antiviral Agents/adverse effects , Genotype , Humans , Interferons/adverse effects , Linkage Disequilibrium/genetics , Polymorphism, Single Nucleotide/genetics , Ribavirin/adverse effects , Thrombocytopenia/chemically induced
6.
Sci Rep ; 13(1): 4900, 2023 03 25.
Article in English | MEDLINE | ID: mdl-36966180

ABSTRACT

The molecular pathophysiology underlying lumbar spondylosis development remains unclear. To identify genetic factors associated with lumbar spondylosis, we conducted a genome-wide association study using 83 severe lumbar spondylosis cases and 182 healthy controls and identified 65 candidate disease-associated single nucleotide polymorphisms (SNPs). Replication analysis in 510 case and 911 control subjects from five independent Japanese cohorts identified rs2054564, located in intron 7 of ADAMTS17, as a disease-associated SNP with a genome-wide significance threshold (P = 1.17 × 10-11, odds ratio = 1.92). This association was significant even after adjustment of age, sex, and body mass index (P = 7.52 × 10-11). A replication study in a Korean cohort, including 123 case and 319 control subjects, also verified the significant association of this SNP with severe lumbar spondylosis. Immunohistochemistry revealed that fibrillin-1 (FBN1) and ADAMTS17 were co-expressed in the annulus fibrosus of intervertebral discs (IVDs). ADAMTS17 overexpression in MG63 cells promoted extracellular microfibrils biogenesis, suggesting the potential role of ADAMTS17 in IVD function through interaction with fibrillin fibers. Finally, we provided evidence of FBN1 involvement in IVD function by showing that lumbar IVDs in patients with Marfan syndrome, caused by heterozygous FBN1 gene mutation, were significantly more degenerated. We identified a common SNP variant, located in ADAMTS17, associated with susceptibility to lumbar spondylosis and demonstrated the potential role of the ADAMTS17-fibrillin network in IVDs in lumbar spondylosis development.


Subject(s)
Intervertebral Disc , Osteoarthritis, Spine , Spondylosis , Humans , Fibrillin-1 , Fibrillins/analysis , Genome-Wide Association Study , Intervertebral Disc/chemistry , Microfibrils , Spondylosis/genetics
7.
BMC Bioinformatics ; 12: 469, 2011 Dec 12.
Article in English | MEDLINE | ID: mdl-22151604

ABSTRACT

BACKGROUND: Multiple genetic factors and their interactive effects are speculated to contribute to complex diseases. Detecting such genetic interactive effects, i.e., epistatic interactions, however, remains a significant challenge in large-scale association studies. RESULTS: We have developed a new method, named SNPInterForest, for identifying epistatic interactions by extending an ensemble learning technique called random forest. Random forest is a predictive method that has been proposed for use in discovering single-nucleotide polymorphisms (SNPs), which are most predictive of the disease status in association studies. However, it is less sensitive to SNPs with little marginal effect. Furthermore, it does not natively exhibit information on interaction patterns of susceptibility SNPs. We extended the random forest framework to overcome the above limitations by means of (i) modifying the construction of the random forest and (ii) implementing a procedure for extracting interaction patterns from the constructed random forest. The performance of the proposed method was evaluated by simulated data under a wide spectrum of disease models. SNPInterForest performed very well in successfully identifying pure epistatic interactions with high precision and was still more than capable of concurrently identifying multiple interactions under the existence of genetic heterogeneity. It was also performed on real GWAS data of rheumatoid arthritis from the Wellcome Trust Case Control Consortium (WTCCC), and novel potential interactions were reported. CONCLUSIONS: SNPInterForest, offering an efficient means to detect epistatic interactions without statistical analyses, is promising for practical use as a way to reveal the epistatic interactions involved in common complex diseases.


Subject(s)
Epistasis, Genetic , Genome-Wide Association Study , Arthritis, Rheumatoid/genetics , Case-Control Studies , Computer Simulation , Genetic Predisposition to Disease , Genotype , Humans , Polymorphism, Single Nucleotide
8.
J Hum Genet ; 56(12): 852-6, 2011 Dec.
Article in English | MEDLINE | ID: mdl-22011818

ABSTRACT

Family and twin studies have indicated that genetic factors have an important role in panic disorder (PD), whereas its pathogenesis has remained elusive. We conducted a genome-wide copy number variation (CNV) association study to elucidate the involvement of structural variants in the etiology of PD. The participants were 2055 genetically unrelated Japanese people (535 PD cases and 1520 controls). CNVs were detected using Genome-Wide Human SNP array 6.0, determined by Birdsuite and confirmed by PennCNV. They were classified as rare CNVs (found in <1% of the total sample) or common CNVs (found in ≥5%). PLINK was used to perform global burden analysis for rare CNVs and association analysis for common CNVs. The sample yielded 2039 rare CNVs and 79 common CNVs. Significant increases in the rare CNV burden in PD cases were not found. Common duplications in 16p11.2 showed Bonferroni-corrected P-values <0.05. Individuals with PD did not exhibit an increased genome-wide rare CNV burden. Common duplications were associated with PD and found in the pericentromeric region of 16p11.2, which had been reported to be rich in low copy repeats and to harbor developmental disorders, neuropsychiatric disorders and dysmorphic features.


Subject(s)
DNA Copy Number Variations , Panic Disorder/genetics , Adult , Asian People/genetics , Case-Control Studies , Chromosomes, Human, Pair 16 , Female , Genome-Wide Association Study , Humans , Japan , Male , Middle Aged
9.
BMC Genet ; 12: 29, 2011 Mar 07.
Article in English | MEDLINE | ID: mdl-21385384

ABSTRACT

BACKGROUND: Array-based detection of copy number variations (CNVs) is widely used for identifying disease-specific genetic variations. However, the accuracy of CNV detection is not sufficient and results differ depending on the detection programs used and their parameters. In this study, we evaluated five widely used CNV detection programs, Birdsuite (mainly consisting of the Birdseye and Canary modules), Birdseye (part of Birdsuite), PennCNV, CGHseg, and DNAcopy from the viewpoint of performance on the Affymetrix platform using HapMap data and other experimental data. Furthermore, we identified CNVs of 180 healthy Japanese individuals using parameters that showed the best performance in the HapMap data and investigated their characteristics. RESULTS: The results indicate that Hidden Markov model-based programs PennCNV and Birdseye (part of Birdsuite), or Birdsuite show better detection performance than other programs when the high reproducibility rates of the same individuals and the low Mendelian inconsistencies are considered. Furthermore, when rates of overlap with other experimental results were taken into account, Birdsuite showed the best performance from the view point of sensitivity but was expected to include many false negatives and some false positives. The results of 180 healthy Japanese demonstrate that the ratio containing repeat sequences, not only segmental repeats but also long interspersed nuclear element (LINE) sequences both in the start and end regions of the CNVs, is higher in CNVs that are commonly detected among multiple individuals than that in randomly selected regions, and the conservation score based on primates is lower in these regions than in randomly selected regions. Similar tendencies were observed in HapMap data and other experimental data. CONCLUSIONS: Our results suggest that not only segmental repeats but also interspersed repeats, especially LINE sequences, are deeply involved in CNVs, particularly in common CNV formations.The detected CNVs are stored in the CNV repository database newly constructed by the "Japanese integrated database project" for sharing data among researchers. http://gwas.lifesciencedb.jp/cgi-bin/cnvdb/cnv_top.cgi.


Subject(s)
Algorithms , DNA Copy Number Variations , Databases, Genetic , Models, Genetic , Asian People/genetics , Humans , Markov Chains , Oligonucleotide Array Sequence Analysis
10.
Hum Mutat ; 31(9): 1003-10, 2010 Sep.
Article in English | MEDLINE | ID: mdl-20556799

ABSTRACT

An amyotrophic lateral sclerosis (ALS) mutation database has been constructed as a publicly accessible online resource for recording the nucleotide and amino acid variants identified in genes associated with ALS, along with corresponding clinical conditions. The database currently consists of more than 600 entries, including about 180 unique variants found in 25 disease-causative or disease-related genes. In addition to published data collected from literature, novel variants identified by microarray resequencing in our laboratory are incorporated into the database. Every reported gene has a respective page that provides information on its variation positions with various statistics, clinical characteristics, and primary references, as well as gene-sequence and protein-structure information that will assist in assessing variation significance. Users can access a homology search function to find variations in arbitrary sequences of interest and to check if they have already been described in the database. This database is expected to fulfill an essential need in terms of integrating comprehensive information on genetic and clinical data related to ALS, which will subsequently deepen our understanding of the possible mechanisms of the disease, as well as help with the clinical practice and treatment of ALS. The database is accessible at: https://reseq.lifesciencedb.jp/resequence/SearchDisease.do?targetId=1. Data submission is open to all researchers and is highly encouraged.


Subject(s)
Amyotrophic Lateral Sclerosis/genetics , Databases, Genetic , Mutation/genetics , Base Sequence , Humans
11.
J Hum Genet ; 54(9): 543-6, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19629137

ABSTRACT

The establishment of high-throughput single-nucleotide polymorphism (SNP)-typing technologies has enabled astonishing progress to be made in genome-wide association studies (GWAS), and various novel genetic factors associated with complex diseases have been discovered. Our organization has created a public repository database (DB) to achieve a continuous and intensive management of GWAS data and to facilitate data sharing among researchers. In the GWAS DB, information on study design, quality control protocols, allele frequencies, genotype frequencies and statistical genetic analysis results are stored as publicly available data and can be accessed freely, whereas individual genotyping data and raw data are stored as restricted data and can only be accessed with authorization. All data are presented by a graphic viewer, which is designed to be user friendly for researchers who are not familiar with GWAS to accelerate disease-related studies. Furthermore, the DB allows users to compare various study results obtained by different institutions and on different platforms. The same data are also managed as a distributed annotation system to call up useful data from other DBs and to superimpose them on the GWAS data for help in interpretation. The DB is accessible at https://gwas.lifesciencedb.jp/.


Subject(s)
Asian People/genetics , Databases, Genetic , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics , Case-Control Studies , Computational Biology , Gene Frequency , Genome, Human , Humans
12.
Genome Inform ; 23(1): 60-71, 2009 Oct.
Article in English | MEDLINE | ID: mdl-20180262

ABSTRACT

We introduce a new data structure, a localized suffix array, based on which occurrence information is dynamically represented as the combination of global positional information and local lexicographic order information in text search applications. For the search of a pair of words within a given distance, many candidate positions that share a coarse-grained global position can be compactly represented in term of local lexicographic orders as in the conventional suffix array, and they can be simultaneously examined for violation of the distance constraint at the coarse-grained resolution. Trade-off between the positional and lexicographical information is progressively shifted towards finer positional resolution, and the distance constraint is reexamined accordingly. Thus the paired search can be efficiently performed even if there are a large number of occurrences for each word. The localized suffix array itself is in fact a reordering of bits inside the conventional suffix array, and their memory requirements are essentially the same. We demonstrate an application to genome mapping problems for paired-end short reads generated by new-generation DNA sequencers. When paired reads are highly repetitive, it is time-consuming to naïvely calculate, sort, and compare all of the coordinates. For a human genome re-sequencing data of 36 base pairs, more than 10 times speedups over the naïve method were observed in almost half of the cases where the sums of redundancies (number of individual occurrences) of paired reads were greater than 2,000.


Subject(s)
Genome , Algorithms , Sequence Analysis, DNA
13.
Article in English | MEDLINE | ID: mdl-29994538

ABSTRACT

The Burrows-Wheeler transform (BWT) of short-read data has unexplored potential utilities, such as for efficient and sensitive variation analysis against multiple reference genome sequences, because it does not depend on any particular reference genome sequence, unlike conventional mapping-based methods. However, since the amount of read data is generally much larger than the size of the reference sequence, computation of the BWT of reads is not easy, and this hampers development of potential applications. For the alleviation of this problem, a new method of computing the BWT of reads in parallel is proposed. The BWT, corresponding to a sorted list of suffixes of reads, is constructed incrementally by successively including longer and longer suffixes. The working data is divided into more than 10,000 "blocks" corresponding to sublists of suffixes with the same prefixes. Thousands of groups of blocks can be processed in parallel while making exclusive writes and concurrent reads into a shared memory. Reads and writes are basically sequential, and the read concurrency is limited to two. Thus, a fine-grained parallelism, referred to as prefix parallelism, is expected to work efficiently. The time complexity for processing n reads of length l is O(nl2). On actual biological DNA sequence data of about 100 Gbp with a read length of 100 bp (base pairs), a tentative implementation of the proposed method took less than an hour on a single-node computer; i.e., it was about three times faster than one of the fastest programs developed so far.


Subject(s)
Algorithms , Data Compression/methods , Databases, Genetic , Sequence Analysis, DNA/methods , Genomics , Humans , Time Factors
14.
BMC Genomics ; 9: 431, 2008 Sep 22.
Article in English | MEDLINE | ID: mdl-18803882

ABSTRACT

BACKGROUND: With improvements in genotyping technologies, genome-wide association studies with hundreds of thousands of SNPs allow the identification of candidate genetic loci for multifactorial diseases in different populations. However, genotyping errors caused by genotyping platforms or genotype calling algorithms may lead to inflation of false associations between markers and phenotypes. In addition, the number of SNPs available for genome-wide association studies in the Japanese population has been investigated using only 45 samples in the HapMap project, which could lead to an inaccurate estimation of the number of SNPs with low minor allele frequencies. We genotyped 400 Japanese samples in order to estimate the number of SNPs available for genome-wide association studies in the Japanese population and to examine the performance of the current SNP Array 6.0 platform and the genotype calling algorithm "Birdseed". RESULTS: About 20% of the 909,622 SNP markers on the array were revealed to be monomorphic in the Japanese population. Consequently, 661,599 SNPs were available for genome-wide association studies in the Japanese population, after excluding the poorly behaving SNPs. The Birdseed algorithm accurately determined the genotype calls of each sample with a high overall call rate of over 99.5% and a high concordance rate of over 99.8% using more than 48 samples after removing low-quality samples by adjusting QC criteria. CONCLUSION: Our results confirmed that the SNP Array 6.0 platform reached the level reported by the manufacturer, and thus genome-wide association studies using the SNP Array 6.0 platform have considerable potential to identify candidate susceptibility or resistance genetic factors for multifactorial diseases in the Japanese population, as well as in other populations.


Subject(s)
Asian People/genetics , Oligonucleotide Array Sequence Analysis/methods , Polymorphism, Single Nucleotide , Algorithms , Computational Biology , Gene Frequency , Humans
15.
Protein Eng Des Sel ; 17(2): 165-73, 2004 Feb.
Article in English | MEDLINE | ID: mdl-15047913

ABSTRACT

The identification of protein-protein interaction sites is essential for the mutant design and prediction of protein-protein networks. The interaction sites of residue units were predicted using support vector machines (SVM) and the profiles of sequentially/spatially neighboring residues, plus additional information. When only sequence information was used, prediction performance was highest using the feature vectors, sequentially neighboring profiles and predicted interaction site ratios, which were calculated by SVM regression using amino acid compositions. When structural information was also used, prediction performance was highest using the feature vectors, spatially neighboring residue profiles, accessible surface areas, and the with/without protein interaction sites ratios predicted by SVM regression and amino acid compositions. In the latter case, the precision at recall = 50% was 54-56% for a homo-hetero mixed test set and >20% higher than for random prediction. Approximately 30% of the residues wrongly predicted as interaction sites were the closest sequentially/spatially neighboring on the interaction site residues. The predicted residues covered 86-87% of the actual interfaces (96-97% of interfaces with over 20 residues). This prediction performance appeared to be slightly higher than a previously reported study. Comparing the prediction accuracy of each molecule, it seems to be easier to predict interaction sites for stable complexes.


Subject(s)
Algorithms , Computational Biology/methods , Protein Binding , Protein Interaction Mapping/methods , Binding Sites , Models, Molecular , Predictive Value of Tests , Protein Conformation , Proteins/chemistry , Proteins/metabolism , Sequence Analysis, Protein/methods
16.
PLoS One ; 9(11): e111715, 2014.
Article in English | MEDLINE | ID: mdl-25364816

ABSTRACT

Elucidation of the genetic susceptibility factors for diabetic retinopathy (DR) is important to gain insight into the pathogenesis of DR, and may help to define genetic risk factors for this condition. In the present study, we conducted a three-stage genome-wide association study (GWAS) to identify DR susceptibility loci in Japanese patients, which comprised a total of 837 type 2 diabetes patients with DR (cases) and 1,149 without DR (controls). From the stage 1 genome-wide scan of 446 subjects (205 cases and 241 controls) on 614,216 SNPs, 249 SNPs were selected for the stage 2 replication in 623 subjects (335 cases and 288 controls). Eight SNPs were further followed up in a stage 3 study of 297 cases and 620 controls. The top signal from the present association analysis was rs9362054 in an intron of RP1-90L14.1 showing borderline genome-wide significance (Pmet = 1.4×10(-7), meta-analysis of stage 1 and stage 2, allele model). RP1-90L14.1 is a long intergenic non-coding RNA (lincRNA) adjacent to KIAA1009/QN1/CEP162 gene; CEP162 plays a critical role in ciliary transition zone formation before ciliogenesis. The present study raises the possibility that the dysregulation of ciliary-associated genes plays a role in susceptibility to DR.


Subject(s)
Diabetic Retinopathy/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , RNA, Long Noncoding/genetics , Adult , Aged , Cilia/genetics , Female , Humans , Japan , Male , Middle Aged
18.
PLoS One ; 8(4): e58618, 2013.
Article in English | MEDLINE | ID: mdl-23565137

ABSTRACT

To discover susceptibility genes of late-onset Alzheimer's disease (LOAD), we conducted a 3-stage genome-wide association study (GWAS) using three populations: Japanese from the Japanese Genetic Consortium for Alzheimer Disease (JGSCAD), Koreans, and Caucasians from the Alzheimer Disease Genetic Consortium (ADGC). In Stage 1, we evaluated data for 5,877,918 genotyped and imputed SNPs in Japanese cases (n = 1,008) and controls (n = 1,016). Genome-wide significance was observed with 12 SNPs in the APOE region. Seven SNPs from other distinct regions with p-values <2×10(-5) were genotyped in a second Japanese sample (885 cases, 985 controls), and evidence of association was confirmed for one SORL1 SNP (rs3781834, P = 7.33×10(-7) in the combined sample). Subsequent analysis combining results for several SORL1 SNPs in the Japanese, Korean (339 cases, 1,129 controls) and Caucasians (11,840 AD cases, 10,931 controls) revealed genome wide significance with rs11218343 (P = 1.77×10(-9)) and rs3781834 (P = 1.04×10(-8)). SNPs in previously established AD loci in Caucasians showed strong evidence of association in Japanese including rs3851179 near PICALM (P = 1.71×10(-5)) and rs744373 near BIN1 (P = 1.39×10(-4)). The associated allele for each of these SNPs was the same as in Caucasians. These data demonstrate for the first time genome-wide significance of LOAD with SORL1 and confirm the role of other known loci for LOAD in Japanese. Our study highlights the importance of examining associations in multiple ethnic populations.


Subject(s)
Alzheimer Disease/genetics , Asian People/genetics , Genetic Predisposition to Disease , LDL-Receptor Related Proteins/genetics , Membrane Transport Proteins/genetics , White People/genetics , Alleles , Chromosome Mapping , Chromosomes, Human, Pair 11 , Genome-Wide Association Study , Genotype , Humans , Japan , Odds Ratio , Polymorphism, Single Nucleotide , Republic of Korea
19.
J Bioinform Comput Biol ; 10(4): 1250002, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22809415

ABSTRACT

Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ. Thus, it is particularly suited for sequences on a small alphabet such as DNA sequences. In particular, it is useful in quickly extending a large number of seed alignments against a reference genome for high-throughput short-read data produced by next-generation DNA sequencers.


Subject(s)
Algorithms , Base Sequence , DNA/chemistry , Computational Biology , Genome , Sequence Alignment , Sequence Analysis, DNA
20.
PLoS One ; 7(6): e39175, 2012.
Article in English | MEDLINE | ID: mdl-22737229

ABSTRACT

Hepatitis B virus (HBV) infection can lead to serious liver diseases, including liver cirrhosis (LC) and hepatocellular carcinoma (HCC); however, about 85-90% of infected individuals become inactive carriers with sustained biochemical remission and very low risk of LC or HCC. To identify host genetic factors contributing to HBV clearance, we conducted genome-wide association studies (GWAS) and replication analysis using samples from HBV carriers and spontaneously HBV-resolved Japanese and Korean individuals. Association analysis in the Japanese and Korean data identified the HLA-DPA1 and HLA-DPB1 genes with P(meta) = 1.89×10⁻¹² for rs3077 and P(meta) = 9.69×10⁻¹° for rs9277542. We also found that the HLA-DPA1 and HLA-DPB1 genes were significantly associated with protective effects against chronic hepatitis B (CHB) in Japanese, Korean and other Asian populations, including Chinese and Thai individuals (P(meta) = 4.40×10⁻¹9 for rs3077 and P(meta) = 1.28×10⁻¹5 for rs9277542). These results suggest that the associations between the HLA-DP locus and the protective effects against persistent HBV infection and with clearance of HBV were replicated widely in East Asian populations; however, there are no reports of GWAS in Caucasian or African populations. Based on the GWAS in this study, there were no significant SNPs associated with HCC development. To clarify the pathogenesis of CHB and the mechanisms of HBV clearance, further studies are necessary, including functional analyses of the HLA-DP molecule.


Subject(s)
Genome-Wide Association Study , HLA-DP Antigens/immunology , Hepatitis B virus/genetics , Hepatitis B, Chronic/prevention & control , Hepatitis B, Chronic/virology , Female , Genotype , HLA-DP Antigens/genetics , HLA-DP alpha-Chains/genetics , HLA-DP beta-Chains/genetics , Haplotypes , Hepatitis B/genetics , Hepatitis B, Chronic/immunology , Humans , Japan , Korea , Linkage Disequilibrium , Male , Odds Ratio , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , Prevalence , Principal Component Analysis , Remission Induction
SELECTION OF CITATIONS
SEARCH DETAIL