Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Sci Rep ; 13(1): 4900, 2023 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-36966180

RESUMEN

The molecular pathophysiology underlying lumbar spondylosis development remains unclear. To identify genetic factors associated with lumbar spondylosis, we conducted a genome-wide association study using 83 severe lumbar spondylosis cases and 182 healthy controls and identified 65 candidate disease-associated single nucleotide polymorphisms (SNPs). Replication analysis in 510 case and 911 control subjects from five independent Japanese cohorts identified rs2054564, located in intron 7 of ADAMTS17, as a disease-associated SNP with a genome-wide significance threshold (P = 1.17 × 10-11, odds ratio = 1.92). This association was significant even after adjustment of age, sex, and body mass index (P = 7.52 × 10-11). A replication study in a Korean cohort, including 123 case and 319 control subjects, also verified the significant association of this SNP with severe lumbar spondylosis. Immunohistochemistry revealed that fibrillin-1 (FBN1) and ADAMTS17 were co-expressed in the annulus fibrosus of intervertebral discs (IVDs). ADAMTS17 overexpression in MG63 cells promoted extracellular microfibrils biogenesis, suggesting the potential role of ADAMTS17 in IVD function through interaction with fibrillin fibers. Finally, we provided evidence of FBN1 involvement in IVD function by showing that lumbar IVDs in patients with Marfan syndrome, caused by heterozygous FBN1 gene mutation, were significantly more degenerated. We identified a common SNP variant, located in ADAMTS17, associated with susceptibility to lumbar spondylosis and demonstrated the potential role of the ADAMTS17-fibrillin network in IVDs in lumbar spondylosis development.


Asunto(s)
Disco Intervertebral , Osteoartritis de la Columna Vertebral , Espondilosis , Humanos , Fibrilina-1 , Fibrilinas/análisis , Estudio de Asociación del Genoma Completo , Disco Intervertebral/química , Microfibrillas , Espondilosis/genética
2.
Artículo en Inglés | MEDLINE | ID: mdl-29994538

RESUMEN

The Burrows-Wheeler transform (BWT) of short-read data has unexplored potential utilities, such as for efficient and sensitive variation analysis against multiple reference genome sequences, because it does not depend on any particular reference genome sequence, unlike conventional mapping-based methods. However, since the amount of read data is generally much larger than the size of the reference sequence, computation of the BWT of reads is not easy, and this hampers development of potential applications. For the alleviation of this problem, a new method of computing the BWT of reads in parallel is proposed. The BWT, corresponding to a sorted list of suffixes of reads, is constructed incrementally by successively including longer and longer suffixes. The working data is divided into more than 10,000 "blocks" corresponding to sublists of suffixes with the same prefixes. Thousands of groups of blocks can be processed in parallel while making exclusive writes and concurrent reads into a shared memory. Reads and writes are basically sequential, and the read concurrency is limited to two. Thus, a fine-grained parallelism, referred to as prefix parallelism, is expected to work efficiently. The time complexity for processing n reads of length l is O(nl2). On actual biological DNA sequence data of about 100 Gbp with a read length of 100 bp (base pairs), a tentative implementation of the proposed method took less than an hour on a single-node computer; i.e., it was about three times faster than one of the fastest programs developed so far.


Asunto(s)
Algoritmos , Compresión de Datos/métodos , Bases de Datos Genéticas , Análisis de Secuencia de ADN/métodos , Genómica , Humanos , Factores de Tiempo
3.
BMC Bioinformatics ; 16 Suppl 18: S5, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26678411

RESUMEN

BACKGROUND: The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads") has not been fully studied. The BWT basically serves as a lossless dictionary of reads, unlike the heuristic and lossy reads-to-genome mapping results conventionally obtained in the first step of sequence analysis. Thus, it is naturally expected to lead to development of sensitive methods for analysis of short-read data. Recently, one of the most active areas of research in sequence analysis is sensitive detection of rare genomic rearrangements from whole-genome sequencing (WGS) data of heterogeneous cancer samples. The application the BWT of reads to the analysis of genomic rearrangements is addressed in this study. RESULTS: A new method for sensitive detection of genomic rearrangements by using the BWT of reads in the following three steps is proposed: first, breakpoint regions, which contain breakpoints and are joined together by rearrangement, are predicted from the distribution of so-called discordant pairs by using a kind of the conjugate gradient method; second, reads partially matching the breakpoint regions are collected from the BWT of reads; and third, breakpoints are detected as branching points among the collected reads, and their precise positions are determined. The method was experimentally implemented, and its performance (i.e., sensitivity and specificity) was evaluated by using simulated data with known artificial rearrangements. It was applied to publicly available real biological WGS data of cancer patients, and the detection results were compared with published results. CONCLUSIONS: Serving as a lossless dictionary of reads, the BWT of short reads enables sensitive analysis of genomic rearrangements in heterogeneous cancer-genome samples when used in conjunction with breakpoint-region predictions based on a conjugate gradient method.


Asunto(s)
Algoritmos , Genómica , Bases de Datos Genéticas , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Programas Informáticos
4.
Brain Behav Immun ; 49: 148-55, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25986216

RESUMEN

Etiology of narcolepsy-cataplexy involves multiple genetic and environmental factors. While the human leukocyte antigen (HLA)-DRB1*15:01-DQB1*06:02 haplotype is strongly associated with narcolepsy, it is not sufficient for disease development. To identify additional, non-HLA susceptibility genes, we conducted a genome-wide association study (GWAS) using Japanese samples. An initial sample set comprising 409 cases and 1562 controls was used for the GWAS of 525,196 single nucleotide polymorphisms (SNPs) located outside the HLA region. An independent sample set comprising 240 cases and 869 controls was then genotyped at 37 SNPs identified in the GWAS. We found that narcolepsy was associated with a SNP in the promoter region of chemokine (C-C motif) receptor 1 (CCR1) (rs3181077, P=1.6×10(-5), odds ratio [OR]=1.86). This rs3181077 association was replicated with the independent sample set (P=0.032, OR=1.36). We measured mRNA levels of candidate genes in peripheral blood samples of 38 cases and 37 controls. CCR1 and CCR3 mRNA levels were significantly lower in patients than in healthy controls, and CCR1 mRNA levels were associated with rs3181077 genotypes. In vitro chemotaxis assays were also performed to measure monocyte migration. We observed that monocytes from carriers of the rs3181077 risk allele had lower migration indices with a CCR1 ligand. CCR1 and CCR3 are newly discovered susceptibility genes for narcolepsy. These results highlight the potential role of CCR genes in narcolepsy and support the hypothesis that patients with narcolepsy have impaired immune function.


Asunto(s)
Narcolepsia/genética , Polimorfismo de Nucleótido Simple , Receptores CCR1/genética , Receptores CCR3/genética , Pueblo Asiatico , Estudio de Asociación del Genoma Completo , Humanos , Japón
6.
Bioinformatics ; 31(10): 1577-83, 2015 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-25609790

RESUMEN

MOTIVATION: Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to single-nucleotide polymorphism (SNP) analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed. RESULTS: The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly, SNPs were found from the BWT much faster than from the mapping results. It took only a few minutes to find SNPs from the BWT (with a supplementary data, fragment depth of coverage [FDC]) using a desktop workstation in the case of human exome or transcriptome sequencing data and 20 min using a dual-CPU server in the case of human genome sequencing data. The SNPs found with the proposed method almost agreed with those found by a time-consuming state-of-the-art tool, except for the cases in which the use of fragments of reads led to sensitivity loss or sequencing depth was not sufficient. These exceptions were predictable in advance on the basis of minimum length for uniqueness (MLU) and FDC defined on the reference genome. Moreover, BWT and FDC were computed in less time than it took to get the mapping results, provided that the data were large enough.


Asunto(s)
Algoritmos , Exoma/genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Humanos
7.
PLoS One ; 9(11): e111715, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25364816

RESUMEN

Elucidation of the genetic susceptibility factors for diabetic retinopathy (DR) is important to gain insight into the pathogenesis of DR, and may help to define genetic risk factors for this condition. In the present study, we conducted a three-stage genome-wide association study (GWAS) to identify DR susceptibility loci in Japanese patients, which comprised a total of 837 type 2 diabetes patients with DR (cases) and 1,149 without DR (controls). From the stage 1 genome-wide scan of 446 subjects (205 cases and 241 controls) on 614,216 SNPs, 249 SNPs were selected for the stage 2 replication in 623 subjects (335 cases and 288 controls). Eight SNPs were further followed up in a stage 3 study of 297 cases and 620 controls. The top signal from the present association analysis was rs9362054 in an intron of RP1-90L14.1 showing borderline genome-wide significance (Pmet = 1.4×10(-7), meta-analysis of stage 1 and stage 2, allele model). RP1-90L14.1 is a long intergenic non-coding RNA (lincRNA) adjacent to KIAA1009/QN1/CEP162 gene; CEP162 plays a critical role in ciliary transition zone formation before ciliogenesis. The present study raises the possibility that the dysregulation of ciliary-associated genes plays a role in susceptibility to DR.


Asunto(s)
Retinopatía Diabética/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , ARN Largo no Codificante/genética , Adulto , Anciano , Cilios/genética , Femenino , Humanos , Japón , Masculino , Persona de Mediana Edad
8.
J Hum Genet ; 59(5): 235-40, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24694762

RESUMEN

In humans, narcolepsy with cataplexy (narcolepsy) is a sleep disorder that is characterized by sleepiness, cataplexy and rapid eye movement (REM) sleep abnormalities. Narcolepsy is caused by a reduction in the number of neurons that produce hypocretin (orexin) neuropeptide. Both genetic and environmental factors contribute to the development of narcolepsy.Rare and large copy number variations (CNVs) reportedly play a role in the etiology of a number of neuropsychiatric disorders. Narcolepsy is considered a neurological disorder; therefore, we sought to investigate any possible association between rare and large CNVs and human narcolepsy. We used DNA microarray data and a CNV detection software application, PennCNV-Affy, to detect CNVs in 426 Japanese narcoleptic patients and 562 healthy individuals. Overall, we found a significant enrichment of rare and large CNVs (frequency ≤1%, size ≥100 kb) in the patients (case-control ratio of CNV count=1.54, P=5.00 × 10(-4)). Next, we extended a region-based association analysis by including CNVs with its size ≥30 kb. Rare and large CNVs in PARK2 region showed a significant association with narcolepsy. Four patients were assessed to carry duplications of the gene region, whereas no controls carried the duplication, which was further confirmed by quantitative PCR assay. This duplication was also found in 2 essential hypersomnia (EHS) patients out of 171 patients. Furthermore, a pathway analysis revealed enrichments of gene disruptions by rare and large CNVs in immune response, acetyltransferase activity, cell cycle regulation and regulation of cell development. This study constitutes the first report on the risk association between multiple rare and large CNVs and the pathogenesis of narcolepsy. In the future, replication studies are needed to confirm the associations.


Asunto(s)
Pueblo Asiatico/genética , Variaciones en el Número de Copia de ADN , Estudio de Asociación del Genoma Completo , Narcolepsia/genética , Estudios de Casos y Controles , Redes Reguladoras de Genes , Humanos , Japón , Narcolepsia/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple , Transducción de Señal , Ubiquitina-Proteína Ligasas/genética
9.
PLoS One ; 8(4): e58618, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23565137

RESUMEN

To discover susceptibility genes of late-onset Alzheimer's disease (LOAD), we conducted a 3-stage genome-wide association study (GWAS) using three populations: Japanese from the Japanese Genetic Consortium for Alzheimer Disease (JGSCAD), Koreans, and Caucasians from the Alzheimer Disease Genetic Consortium (ADGC). In Stage 1, we evaluated data for 5,877,918 genotyped and imputed SNPs in Japanese cases (n = 1,008) and controls (n = 1,016). Genome-wide significance was observed with 12 SNPs in the APOE region. Seven SNPs from other distinct regions with p-values <2×10(-5) were genotyped in a second Japanese sample (885 cases, 985 controls), and evidence of association was confirmed for one SORL1 SNP (rs3781834, P = 7.33×10(-7) in the combined sample). Subsequent analysis combining results for several SORL1 SNPs in the Japanese, Korean (339 cases, 1,129 controls) and Caucasians (11,840 AD cases, 10,931 controls) revealed genome wide significance with rs11218343 (P = 1.77×10(-9)) and rs3781834 (P = 1.04×10(-8)). SNPs in previously established AD loci in Caucasians showed strong evidence of association in Japanese including rs3851179 near PICALM (P = 1.71×10(-5)) and rs744373 near BIN1 (P = 1.39×10(-4)). The associated allele for each of these SNPs was the same as in Caucasians. These data demonstrate for the first time genome-wide significance of LOAD with SORL1 and confirm the role of other known loci for LOAD in Japanese. Our study highlights the importance of examining associations in multiple ethnic populations.


Asunto(s)
Enfermedad de Alzheimer/genética , Pueblo Asiatico/genética , Predisposición Genética a la Enfermedad , Proteínas Relacionadas con Receptor de LDL/genética , Proteínas de Transporte de Membrana/genética , Población Blanca/genética , Alelos , Mapeo Cromosómico , Cromosomas Humanos Par 11 , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Japón , Oportunidad Relativa , Polimorfismo de Nucleótido Simple , República de Corea
10.
J Bioinform Comput Biol ; 10(4): 1250002, 2012 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-22809415

RESUMEN

Myers' elegant and powerful bit-parallel dynamic programming algorithm for approximate string matching has a restriction that the query length should be within the word size of the computer, typically 64. We propose a modification of Myers' algorithm, in which the modification has a restriction not on the query length but on the maximum number of mismatches (substitutions, insertions, or deletions), which should be less than half of the word size. The time complexity is O(m log |Σ|), where m is the query length and |Σ| is the size of the alphabet Σ. Thus, it is particularly suited for sequences on a small alphabet such as DNA sequences. In particular, it is useful in quickly extending a large number of seed alignments against a reference genome for high-throughput short-read data produced by next-generation DNA sequencers.


Asunto(s)
Algoritmos , Secuencia de Bases , ADN/química , Biología Computacional , Genoma , Alineación de Secuencia , Análisis de Secuencia de ADN
11.
PLoS One ; 7(6): e39175, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22737229

RESUMEN

Hepatitis B virus (HBV) infection can lead to serious liver diseases, including liver cirrhosis (LC) and hepatocellular carcinoma (HCC); however, about 85-90% of infected individuals become inactive carriers with sustained biochemical remission and very low risk of LC or HCC. To identify host genetic factors contributing to HBV clearance, we conducted genome-wide association studies (GWAS) and replication analysis using samples from HBV carriers and spontaneously HBV-resolved Japanese and Korean individuals. Association analysis in the Japanese and Korean data identified the HLA-DPA1 and HLA-DPB1 genes with P(meta) = 1.89×10⁻¹² for rs3077 and P(meta) = 9.69×10⁻¹° for rs9277542. We also found that the HLA-DPA1 and HLA-DPB1 genes were significantly associated with protective effects against chronic hepatitis B (CHB) in Japanese, Korean and other Asian populations, including Chinese and Thai individuals (P(meta) = 4.40×10⁻¹9 for rs3077 and P(meta) = 1.28×10⁻¹5 for rs9277542). These results suggest that the associations between the HLA-DP locus and the protective effects against persistent HBV infection and with clearance of HBV were replicated widely in East Asian populations; however, there are no reports of GWAS in Caucasian or African populations. Based on the GWAS in this study, there were no significant SNPs associated with HCC development. To clarify the pathogenesis of CHB and the mechanisms of HBV clearance, further studies are necessary, including functional analyses of the HLA-DP molecule.


Asunto(s)
Estudio de Asociación del Genoma Completo , Antígenos HLA-DP/inmunología , Virus de la Hepatitis B/genética , Hepatitis B Crónica/prevención & control , Hepatitis B Crónica/virología , Femenino , Genotipo , Antígenos HLA-DP/genética , Cadenas alfa de HLA-DP/genética , Cadenas beta de HLA-DP/genética , Haplotipos , Hepatitis B/genética , Hepatitis B Crónica/inmunología , Humanos , Japón , Corea (Geográfico) , Desequilibrio de Ligamiento , Masculino , Oportunidad Relativa , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple , Prevalencia , Análisis de Componente Principal , Inducción de Remisión
12.
BMC Bioinformatics ; 12: 469, 2011 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-22151604

RESUMEN

BACKGROUND: Multiple genetic factors and their interactive effects are speculated to contribute to complex diseases. Detecting such genetic interactive effects, i.e., epistatic interactions, however, remains a significant challenge in large-scale association studies. RESULTS: We have developed a new method, named SNPInterForest, for identifying epistatic interactions by extending an ensemble learning technique called random forest. Random forest is a predictive method that has been proposed for use in discovering single-nucleotide polymorphisms (SNPs), which are most predictive of the disease status in association studies. However, it is less sensitive to SNPs with little marginal effect. Furthermore, it does not natively exhibit information on interaction patterns of susceptibility SNPs. We extended the random forest framework to overcome the above limitations by means of (i) modifying the construction of the random forest and (ii) implementing a procedure for extracting interaction patterns from the constructed random forest. The performance of the proposed method was evaluated by simulated data under a wide spectrum of disease models. SNPInterForest performed very well in successfully identifying pure epistatic interactions with high precision and was still more than capable of concurrently identifying multiple interactions under the existence of genetic heterogeneity. It was also performed on real GWAS data of rheumatoid arthritis from the Wellcome Trust Case Control Consortium (WTCCC), and novel potential interactions were reported. CONCLUSIONS: SNPInterForest, offering an efficient means to detect epistatic interactions without statistical analyses, is promising for practical use as a way to reveal the epistatic interactions involved in common complex diseases.


Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Artritis Reumatoide/genética , Estudios de Casos y Controles , Simulación por Computador , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Polimorfismo de Nucleótido Simple
13.
J Hum Genet ; 56(12): 852-6, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22011818

RESUMEN

Family and twin studies have indicated that genetic factors have an important role in panic disorder (PD), whereas its pathogenesis has remained elusive. We conducted a genome-wide copy number variation (CNV) association study to elucidate the involvement of structural variants in the etiology of PD. The participants were 2055 genetically unrelated Japanese people (535 PD cases and 1520 controls). CNVs were detected using Genome-Wide Human SNP array 6.0, determined by Birdsuite and confirmed by PennCNV. They were classified as rare CNVs (found in <1% of the total sample) or common CNVs (found in ≥5%). PLINK was used to perform global burden analysis for rare CNVs and association analysis for common CNVs. The sample yielded 2039 rare CNVs and 79 common CNVs. Significant increases in the rare CNV burden in PD cases were not found. Common duplications in 16p11.2 showed Bonferroni-corrected P-values <0.05. Individuals with PD did not exhibit an increased genome-wide rare CNV burden. Common duplications were associated with PD and found in the pericentromeric region of 16p11.2, which had been reported to be rich in low copy repeats and to harbor developmental disorders, neuropsychiatric disorders and dysmorphic features.


Asunto(s)
Variaciones en el Número de Copia de ADN , Trastorno de Pánico/genética , Adulto , Pueblo Asiatico/genética , Estudios de Casos y Controles , Cromosomas Humanos Par 16 , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Japón , Masculino , Persona de Mediana Edad
14.
Hum Mol Genet ; 20(17): 3507-16, 2011 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-21659334

RESUMEN

Hematologic abnormalities during current therapy with pegylated interferon and ribavirin (PEG-IFN/RBV) for chronic hepatitis C (CHC) often necessitate dose reduction and premature withdrawal from therapy. The aim of this study was to identify host factors associated with IFN-induced thrombocytopenia by genome-wide association study (GWAS). In the GWAS stage using 900K single-nucleotide polymorphism (SNP) microarrays, 303 Japanese CHC patients treated with PEG-IFN/RBV therapy were genotyped. One SNP (rs11697186) located on DDRGK1 gene on chromosome 20 showed strong associations in the minor-allele-dominant model with the decrease of platelet counts in response to PEG-IFN/RBV therapy [P = 8.17 × 10(-9); odds ratio (OR) = 4.6]. These associations were replicated in another sample set (n = 391) and the combined P-values reached 5.29 × 10(-17) (OR = 4.5). Fine mapping with 22 SNPs around DDRGK1 and ITPA genes showed that rs11697186 at the GWAS stage had a strong linkage disequilibrium with rs1127354, known as a functional variant in the ITPA gene. The ITPA-AA/CA genotype was independently associated with a higher degree of reduction in platelet counts at week 4 (P < 0.0001), as well as protection against the reduction in hemoglobin, whereas the CC genotype had significantly less reduction in the mean platelet counts compared with the AA/CA genotype (P < 0.0001 for weeks 2, 4, 8, 12), due to a reactive increase of the platelet count through weeks 1-4. Our present results may provide a valuable pharmacogenetic diagnostic tool for tailoring PEG-IFN/RBV dosing to minimize drug-induced adverse events.


Asunto(s)
Antivirales/uso terapéutico , Estudio de Asociación del Genoma Completo/métodos , Hepatitis C Crónica/tratamiento farmacológico , Interferones/uso terapéutico , Pirofosfatasas/genética , Ribavirina/uso terapéutico , Trombocitopenia/genética , Antivirales/efectos adversos , Genotipo , Humanos , Interferones/efectos adversos , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Ribavirina/efectos adversos , Trombocitopenia/inducido químicamente
15.
BMC Genet ; 12: 29, 2011 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-21385384

RESUMEN

BACKGROUND: Array-based detection of copy number variations (CNVs) is widely used for identifying disease-specific genetic variations. However, the accuracy of CNV detection is not sufficient and results differ depending on the detection programs used and their parameters. In this study, we evaluated five widely used CNV detection programs, Birdsuite (mainly consisting of the Birdseye and Canary modules), Birdseye (part of Birdsuite), PennCNV, CGHseg, and DNAcopy from the viewpoint of performance on the Affymetrix platform using HapMap data and other experimental data. Furthermore, we identified CNVs of 180 healthy Japanese individuals using parameters that showed the best performance in the HapMap data and investigated their characteristics. RESULTS: The results indicate that Hidden Markov model-based programs PennCNV and Birdseye (part of Birdsuite), or Birdsuite show better detection performance than other programs when the high reproducibility rates of the same individuals and the low Mendelian inconsistencies are considered. Furthermore, when rates of overlap with other experimental results were taken into account, Birdsuite showed the best performance from the view point of sensitivity but was expected to include many false negatives and some false positives. The results of 180 healthy Japanese demonstrate that the ratio containing repeat sequences, not only segmental repeats but also long interspersed nuclear element (LINE) sequences both in the start and end regions of the CNVs, is higher in CNVs that are commonly detected among multiple individuals than that in randomly selected regions, and the conservation score based on primates is lower in these regions than in randomly selected regions. Similar tendencies were observed in HapMap data and other experimental data. CONCLUSIONS: Our results suggest that not only segmental repeats but also interspersed repeats, especially LINE sequences, are deeply involved in CNVs, particularly in common CNV formations.The detected CNVs are stored in the CNV repository database newly constructed by the "Japanese integrated database project" for sharing data among researchers. http://gwas.lifesciencedb.jp/cgi-bin/cnvdb/cnv_top.cgi.


Asunto(s)
Algoritmos , Variaciones en el Número de Copia de ADN , Bases de Datos Genéticas , Modelos Genéticos , Pueblo Asiatico/genética , Humanos , Cadenas de Markov , Análisis de Secuencia por Matrices de Oligonucleótidos
16.
Hum Mutat ; 31(9): 1003-10, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20556799

RESUMEN

An amyotrophic lateral sclerosis (ALS) mutation database has been constructed as a publicly accessible online resource for recording the nucleotide and amino acid variants identified in genes associated with ALS, along with corresponding clinical conditions. The database currently consists of more than 600 entries, including about 180 unique variants found in 25 disease-causative or disease-related genes. In addition to published data collected from literature, novel variants identified by microarray resequencing in our laboratory are incorporated into the database. Every reported gene has a respective page that provides information on its variation positions with various statistics, clinical characteristics, and primary references, as well as gene-sequence and protein-structure information that will assist in assessing variation significance. Users can access a homology search function to find variations in arbitrary sequences of interest and to check if they have already been described in the database. This database is expected to fulfill an essential need in terms of integrating comprehensive information on genetic and clinical data related to ALS, which will subsequently deepen our understanding of the possible mechanisms of the disease, as well as help with the clinical practice and treatment of ALS. The database is accessible at: https://reseq.lifesciencedb.jp/resequence/SearchDisease.do?targetId=1. Data submission is open to all researchers and is highly encouraged.


Asunto(s)
Esclerosis Amiotrófica Lateral/genética , Bases de Datos Genéticas , Mutación/genética , Secuencia de Bases , Humanos
17.
Artif Intell Med ; 49(3): 135-43, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20427165

RESUMEN

OBJECTIVE: As more full-text biomedical papers are becoming available in digitized form online, there is a need for tools to mine information from all parts of such papers. Because the figures and legends/captions in biomedical papers provide important information about research outcomes, mining techniques targeting them have attracted a great deal of attention. In this study, we focused on pathway figures that illustrate signaling or metabolic pathways, because many of these are important in understanding disease mechanism(s). We developed a figure classification system based on textual information contained in biomedical papers to provide an automated acquisition system for such pathway figures. MATERIALS AND METHODS: We used full-text journal articles available on PubMed Central as our data set. We used several supervised machine learning methods, such as decision tree and a support vector machine, to classify figures in the data set. We compared the classification performance among the cases using only figure legends, using only sentences referring to the figure in the main text of the article, and combining figure legends with sentences referring to the figure in the main text of the article. RESULTS: Compared with previous related work, a sufficiently high performance was achieved with the figure legends alone. The performance with the sentences referring to the figure in the main text was actually lower than that with the figure legends alone, indicating that focusing on the main text alone is inadequate. The combination of legend and main text clearly had an effect, but including the prior and following sentences in addition to the sentence referring to the figure dramatically improved the performance. CONCLUSIONS: We developed an automatic pathway figure classification system based on both figure legends and the main text that has quite a high degree of accuracy. To our knowledge, this is the first attempt to address a figure classification task using legends and the main text, and it may provide a first stage for achieving efficient figure mining.


Asunto(s)
Enfermedad , Ilustración Médica , Inteligencia Artificial , Humanos , Publicaciones Periódicas como Asunto
18.
Nat Genet ; 41(10): 1105-9, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19749757

RESUMEN

The recommended treatment for patients with chronic hepatitis C, pegylated interferon-alpha (PEG-IFN-alpha) plus ribavirin (RBV), does not provide sustained virologic response (SVR) in all patients. We report a genome-wide association study (GWAS) to null virological response (NVR) in the treatment of patients with hepatitis C virus (HCV) genotype 1 within a Japanese population. We found two SNPs near the gene IL28B on chromosome 19 to be strongly associated with NVR (rs12980275, P = 1.93 x 10(-13), and rs8099917, 3.11 x 10(-15)). We replicated these associations in an independent cohort (combined P values, 2.84 x 10(-27) (OR = 17.7; 95% CI = 10.0-31.3) and 2.68 x 10(-32) (OR = 27.1; 95% CI = 14.6-50.3), respectively). Compared to NVR, these SNPs were also associated with SVR (rs12980275, P = 3.99 x 10(-24), and rs8099917, P = 1.11 x 10(-27)). In further fine mapping of the region, seven SNPs (rs8105790, rs11881222, rs8103142, rs28416813, rs4803219, rs8099917 and rs7248668) located in the IL28B region showed the most significant associations (P = 5.52 x 10(-28)-2.68 x 10(-32); OR = 22.3-27.1). Real-time quantitative PCR assays in peripheral blood mononuclear cells showed lower IL28B expression levels in individuals carrying the minor alleles (P = 0.015).


Asunto(s)
Antivirales/uso terapéutico , Estudio de Asociación del Genoma Completo , Hepatitis C Crónica/tratamiento farmacológico , Hepatitis C Crónica/genética , Interferón-alfa/uso terapéutico , Interleucinas/genética , Polimorfismo de Nucleótido Simple , Ribavirina/uso terapéutico , Alelos , Pueblo Asiatico/genética , Cromosomas Humanos Par 19 , Combinación de Medicamentos , Femenino , Genoma Humano , Haplotipos , Hepatitis C Crónica/virología , Humanos , Interferones , Masculino , Resultado del Tratamiento
19.
J Comput Biol ; 16(11): 1601-13, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19772398

RESUMEN

Abstract We have developed efficient in-practice algorithms for computing rank and select functions on a binary string, based on a novel data structure, a hierarchical binary string with hierarchical accumulatives. It efficiently stores decomposed information on partial summations over various scales of subregions of a given binary string, so that the required space overhead ratio is only about 3.5% irrespective of the string length. Values of rank and select functions are computed hierarchically in [(log(2)n)/8] iterations, where n is the string length. For example, for an unbiased random binary string of 64 G bits, each value of these functions can be computed in about a microsecond, on average, on a single 3.0-GHz CPU using 8+ GB of memory. We also present their applications to genome mapping problems for large-scale short-read DNA sequence data, especially produced by ultra-high-throughput new-generation DNA sequencers. The algorithms are applied to the binarization of the Burrows-Wheeler transform of the human genome DNA sequence. For the sake of high-speed performance, we adopted a somewhat stringent mapping condition that allows at most a single-base mismatch (either a substitution, insertion, or deletion of a single base) per query sequence. An experimentally implemented program mapped several thousands of sequences per second on a single 3.0-GHz CPU, several times faster than ELAND, a widely used mapping program with the Illumina-Solexa 1G analyser.


Asunto(s)
Mapeo Cromosómico/métodos , Biología Computacional/métodos , Genoma Humano/genética , Algoritmos , Secuencia de Bases , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Factores de Tiempo
20.
J Hum Genet ; 54(9): 543-6, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19629137

RESUMEN

The establishment of high-throughput single-nucleotide polymorphism (SNP)-typing technologies has enabled astonishing progress to be made in genome-wide association studies (GWAS), and various novel genetic factors associated with complex diseases have been discovered. Our organization has created a public repository database (DB) to achieve a continuous and intensive management of GWAS data and to facilitate data sharing among researchers. In the GWAS DB, information on study design, quality control protocols, allele frequencies, genotype frequencies and statistical genetic analysis results are stored as publicly available data and can be accessed freely, whereas individual genotyping data and raw data are stored as restricted data and can only be accessed with authorization. All data are presented by a graphic viewer, which is designed to be user friendly for researchers who are not familiar with GWAS to accelerate disease-related studies. Furthermore, the DB allows users to compare various study results obtained by different institutions and on different platforms. The same data are also managed as a distributed annotation system to call up useful data from other DBs and to superimpose them on the GWAS data for help in interpretation. The DB is accessible at https://gwas.lifesciencedb.jp/.


Asunto(s)
Pueblo Asiatico/genética , Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética , Estudios de Casos y Controles , Biología Computacional , Frecuencia de los Genes , Genoma Humano , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...