Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Clin Genet ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38719617

RESUMO

Genetic maps are fundamental resources for linkage and association studies. A fine-scale genetic map can be constructed by inferring historical recombination events from the genome-wide structure of linkage disequilibrium-a non-random association of alleles among loci-by using population-scale sequencing data. We constructed a fine-scale genetic map and identified recombination hotspots from 10 092 551 bi-allelic high-quality autosomal markers segregating among 150 unrelated Japanese individuals whose genotypes were determined by high-coverage (30×) whole-genome sequencing, and the genotype quality was carefully controlled by using their parents' and offspring's genotypes. The pedigree information was also utilized for haplotype phasing. The resulting genome-wide recombination rate profiles were concordant with those of the worldwide population on a broad scale, and the resolution was much improved. We identified 9487 recombination hotspots and confirmed the enrichment of previously known motifs in the hotspots. Moreover, we demonstrated that the Japanese genetic map improved the haplotype phasing and genotype imputation accuracy for the Japanese population. The construction of a population-specific genetic map will help make genetics research more accurate.

2.
J Hum Genet ; 66(1): 61-65, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32782383

RESUMO

Large-scale, sometimes nationwide, prospective genomic cohorts biobanking rich biological specimens such as blood, urine and tissues, have been established and released their vast amount of data in several countries. These genetic and epidemiological resources are expected to allow investigators to disentangle genetic and environmental components conferring common complex diseases. There are, however, two major challenges to statistical genetics for this goal: small sample size-high dimensionality and multilayered-heterogenous endophenotypes. Rather counterintuitively, biobank data generally have small sample size relative to their data dimensionality consisting of genomic variation, lifestyle questionnaire, and sometimes their interaction. This is a widely acknowledged difficulty in data analysis, so-called "p¼n problem" in statistics or "curse of dimensionality" in machine-learning field. On the other hand, we have too many measurements of individual health status, which are endophenotypes, such as health check-up data, images, psychological test scores in addition to metabolomics and proteomics data. These endophenotypes are rich but not so tractable because of their worsen dimensionality, and substantial correlation, sometimes confusing causation among them. We have tried to overcome the problems inherent to biobank data, using statistical machine-learning and deep-learning technologies.


Assuntos
Inteligência Artificial , Bancos de Espécimes Biológicos/estatística & dados numéricos , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Polimorfismo de Nucleotídeo Único , Bancos de Espécimes Biológicos/organização & administração , Biologia Computacional/métodos , Estudos de Associação Genética/métodos , Humanos , Reprodutibilidade dos Testes
3.
J Inherit Metab Dis ; 42(3): 501-508, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30715743

RESUMO

Citrin deficiency causes neonatal intrahepatic cholestasis (NICCD), failure to thrive and dyslipidemia (FTTDCD), and adult-onset type II citrullinemia (CTLN2). Owing to a defect in the NADH-shuttle, citrin deficiency impairs hepatic glycolysis and de novo lipogenesis leading to hepatic energy deficit. To investigate the physiological role of citrin, we studied the growth of 111 NICCD-affected subjects (51 males and 60 females) and 12 NICCD-unaffected subjects (five males and seven females), including the body weight, height, and genotype. We constructed growth charts using the lambda-mu-sigma (LMS) method. The NICCD-affected subjects showed statistically significant growth impairment, including low birth weight and length, low body weight until 6 to 9 months of age, low height until 11 to 13 years of age, and low body weight in 7 to 12-year-old males and 8-year-old females. NICCD-unaffected subjects showed similar growth impairment, including low birth weight and height, and growth impairment during adolescence. In the third trimester, de novo lipogenesis is required for deposition of body fat and myelination of the developing central nervous system, and its impairment likely causes low birth weight and length. The growth rate is the highest during the first 6 months of life and slows down after 6 months of age, which is probably associated with the onset and recovery of NICCD. Adolescence is the second catch-up growth period, and the proportion and distribution of body fat change depending on age and sex. Characteristic growth impairment in citrin deficiency suggests a significant role of citrin in the catch-up growth via lipogenesis.


Assuntos
Proteínas de Ligação ao Cálcio/metabolismo , Citrulinemia/complicações , Insuficiência de Crescimento/etiologia , Transtornos do Crescimento/etiologia , Transportadores de Ânions Orgânicos/metabolismo , Adolescente , Criança , Pré-Escolar , Colestase Intra-Hepática/etiologia , Citrulinemia/diagnóstico , Dislipidemias/etiologia , Feminino , Humanos , Lactente , Recém-Nascido , Japão , Masculino
4.
Genet Epidemiol ; 41(6): 481-497, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28626864

RESUMO

Genome-wide association studies (GWASs) commonly use marginal association tests for each single-nucleotide polymorphism (SNP). Because these tests treat SNPs as independent, their power will be suboptimal for detecting SNPs hidden by linkage disequilibrium (LD). One way to improve power is to use a multiple regression model. However, the large number of SNPs preclude simultaneous fitting with multiple regression, and subset regression is infeasible because of an exorbitant number of candidate subsets. We therefore propose a new method for detecting hidden SNPs having significant yet weak marginal association in a multiple regression model. Our method begins by constructing a bidirected graph locally around each SNP that demonstrates a moderately sized marginal association signal, the focal SNPs. Vertexes correspond to SNPs, and adjacency between vertexes is defined by an LD measure. Subsequently, the method collects from each graph all shortest paths to the focal SNP. Finally, for each shortest path the method fits a multiple regression model to all the SNPs lying in the path and tests the significance of the regression coefficient corresponding to the terminal SNP in the path. Simulation studies show that the proposed method can detect susceptibility SNPs hidden by LD that go undetected with marginal association testing or with existing multivariate methods. When applied to real GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), our method detected two groups of SNPs: one in a region containing the apolipoprotein E (APOE) gene, and another in a region close to the semaphorin 5A (SEMA5A) gene.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/genética , Simulação por Computador , Humanos , Desequilíbrio de Ligação , Neuroimagem , Polimorfismo de Nucleotídeo Único/genética , Característica Quantitativa Herdável
5.
Genet Epidemiol ; 40(3): 233-43, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26947266

RESUMO

We develop a new genetic prediction method, smooth-threshold multivariate genetic prediction, using single nucleotide polymorphisms (SNPs) data in genome-wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univariate analysis for marginal association of each SNP. At the second stage, the predictive model is built by a generalized ridge regression simultaneously using the screened SNPs with SNP weight determined by the strength of marginal association. Continuous SNP screening by the smooth thresholding not only makes prediction stable but also leads to a closed form expression of generalized degrees of freedom (GDF). The GDF leads to the Stein's unbiased risk estimation (SURE), which enables data-dependent choice of optimal SNP screening cutoff without using cross-validation. Our method is very rapid because computationally expensive genome-wide scan is required only once in contrast to the penalized regression methods including lasso and elastic net. Simulation studies that mimic real GWAS data with quantitative and binary traits demonstrate that the proposed method outperforms the gene score method and genomic best linear unbiased prediction (GBLUP), and also shows comparable or sometimes improved performance with the lasso and elastic net being known to have good predictive ability but with heavy computational cost. Application to whole-genome sequencing (WGS) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) exhibits that the proposed method shows higher predictive power than the gene score and GBLUP methods.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Algoritmos , Doença de Alzheimer/genética , Genoma Humano/genética , Genômica/métodos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Característica Quantitativa Herdável , Análise de Regressão , Reprodutibilidade dos Testes , Projetos de Pesquisa
6.
Am J Hum Genet ; 95(3): 294-300, 2014 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-25152455

RESUMO

Charcot-Marie-Tooth disease (CMT) is the most common inherited neuropathy characterized by clinical and genetic heterogeneity. Although more than 30 loci harboring CMT-causing mutations have been identified, many other genes still remain to be discovered for many affected individuals. For two consanguineous families with CMT (axonal and mixed phenotypes), a parametric linkage analysis using genome-wide SNP chip identified a 4.3 Mb region on 12q24 showing a maximum multipoint LOD score of 4.23. Subsequent whole-genome sequencing study in one of the probands, followed by mutation screening in the two families, revealed a disease-specific 5 bp deletion (c.247-10_247-6delCACTC) in a splicing element (pyrimidine tract) of intron 2 adjacent to the third exon of cytochrome c oxidase subunit VIa polypeptide 1 (COX6A1), which is a component of mitochondrial respiratory complex IV (cytochrome c oxidase [COX]), within the autozygous linkage region. Functional analysis showed that expression of COX6A1 in peripheral white blood cells from the affected individuals and COX activity in their EB-virus-transformed lymphoblastoid cell lines were significantly reduced. In addition, Cox6a1-null mice showed significantly reduced COX activity and neurogenic muscular atrophy leading to a difficulty in walking. Those data indicated that COX6A1 mutation causes the autosomal-recessive axonal or mixed CMT.


Assuntos
Axônios/fisiologia , Doença de Charcot-Marie-Tooth/genética , Complexo IV da Cadeia de Transporte de Elétrons/genética , Complexo IV da Cadeia de Transporte de Elétrons/fisiologia , Genes Recessivos/genética , Atrofia Muscular/genética , Mutação/genética , Adulto , Animais , Consanguinidade , Eletrofisiologia , Feminino , Ligação Genética , Humanos , Escore Lod , Masculino , Camundongos , Camundongos Knockout , Linhagem , Fenótipo , Splicing de RNA/genética
7.
Plant Cell ; 26(2): 636-49, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24569769

RESUMO

In the Brassicaceae, intraspecific non-self pollen (compatible pollen) can germinate and grow into stigmatic papilla cells, while self-pollen or interspecific pollen is rejected at this stage. However, the mechanisms underlying this selective acceptance of compatible pollen remain unclear. Here, using a cell-impermeant calcium indicator, we showed that the compatible pollen coat contains signaling molecules that stimulate Ca(2+) export from the papilla cells. Transcriptome analyses of stigmas suggested that autoinhibited Ca(2+)-ATPase13 (ACA13) was induced after both compatible pollination and compatible pollen coat treatment. A complementation test using a yeast Saccharomyces cerevisiae strain lacking major Ca(2+) transport systems suggested that ACA13 indeed functions as an autoinhibited Ca(2+) transporter. ACA13 transcription increased in papilla cells and in transmitting tracts after pollination. ACA13 protein localized to the plasma membrane and to vesicles near the Golgi body and accumulated at the pollen tube penetration site after pollination. The stigma of a T-DNA insertion line of ACA13 exhibited reduced Ca(2+) export, as well as defects in compatible pollen germination and seed production. These findings suggest that stigmatic ACA13 functions in the export of Ca(2+) to the compatible pollen tube, which promotes successful fertilization.


Assuntos
Arabidopsis/enzimologia , Arabidopsis/fisiologia , Brassica rapa/enzimologia , Brassica rapa/fisiologia , ATPases Transportadoras de Cálcio/metabolismo , Pólen/enzimologia , Polinização/fisiologia , Arabidopsis/citologia , Arabidopsis/genética , Bioensaio , Brassica rapa/citologia , Brassica rapa/genética , Cálcio/metabolismo , ATPases Transportadoras de Cálcio/antagonistas & inibidores , Cruzamentos Genéticos , DNA Bacteriano/genética , Deleção de Genes , Regulação da Expressão Gênica de Plantas , Teste de Complementação Genética , Proteínas de Membrana Transportadoras/metabolismo , Mutagênese Insercional/genética , Análise de Sequência com Séries de Oligonucleotídeos , Compostos Orgânicos/metabolismo , Fenótipo , Pólen/citologia , Pólen/ultraestrutura , Transporte Proteico , Saccharomyces cerevisiae/metabolismo , Autofertilização , Frações Subcelulares/metabolismo , Transcrição Gênica
8.
PLoS Genet ; 8(4): e1002625, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22496670

RESUMO

Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new "joint effects" statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Estatísticos , Humanos
9.
Stat Med ; 33(28): 4934-48, 2014 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-25043617

RESUMO

In gene-gene interaction analysis using single nucleotide polymorphism (SNP) data, empty cells arise in the genotype contingency table more frequently than in single SNP association studies. Empty cells lead to unidentifiable regression coefficients in regression model fitting. It is unclear whether the degrees of freedom (d.f.) for testing interactions are reduced for such sparse contingency tables. Boolean Operation based Screening and Testing is an exhaustive gene-gene interaction search method in which a fixed d.f. of four (the most conservative choice) is used in the chi-squared null distribution for the likelihood ratio test for gene-gene interactions under a logistic regression model. In this paper, the choice of d.f. is investigated theoretically by introducing a decomposition of type I error. An adaptive method using the observed d.f. can be less conservative than the fixed d.f. method, thereby enhancing power. In simulated data, type I error rates for the adaptive method were usually better controlled under various scenarios for Gaussian linear regression and logistic regression, including prospective and retrospective sampling designs, as well as for artificial data that mimic actual genome-wide SNPs. When the adaptive method was applied to public datasets generated from simulations, it exhibited an improvement in power over the fixed method.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Logísticos , Modelos Estatísticos , Distribuição Normal , Polimorfismo de Nucleotídeo Único/genética , Simulação por Computador , Humanos , Estudos Prospectivos , Estudos Retrospectivos
10.
J Appl Stat ; 50(7): 1650-1663, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37197760

RESUMO

Coronavirus disease 2019 (COVID-19) caused by the SARS-CoV-2 virus has spread seriously throughout the world. Predicting the spread, or the number of cases, in the future can facilitate preparation for, and prevention of, a worst-case scenario. To achieve these purposes, statistical modeling using past data is one feasible approach. This paper describes spatio-temporal modeling of COVID-19 case counts in 47 prefectures of Japan using a nonlinear random effects model, where random effects are introduced to capture the heterogeneity of a number of model parameters associated with the prefectures. The negative binomial distribution is frequently used with the Paul-Held random effects model to account for overdispersion in count data; however, the negative binomial distribution is known to be incapable of accommodating extreme observations such as those found in the COVID-19 case count data. We therefore propose use of the beta-negative binomial distribution with the Paul-Held model. This distribution is a generalization of the negative binomial distribution that has attracted much attention in recent years because it can model extreme observations with analytical tractability. The proposed beta-negative binomial model was applied to multivariate count time series data of COVID-19 cases in the 47 prefectures of Japan. Evaluation by one-step-ahead prediction showed that the proposed model can accommodate extreme observations without sacrificing predictive performance.

11.
Comput Psychiatr ; 7(1): 14-29, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38774640

RESUMO

Functional connectivity (FC) and neural excitability may interact to affect symptoms of autism spectrum disorder (ASD). We tested this hypothesis with neural network simulations, and applied it with functional magnetic resonance imaging (fMRI). A hierarchical recurrent neural network embodying predictive processing theory was subjected to a facial emotion recognition task. Neural network simulations examined the effects of FC and neural excitability on changes in neural representations by developmental learning, and eventually on ASD-like performance. Next, by mapping each neural network condition to subject subgroups on the basis of fMRI parameters, the association between ASD-like performance in the simulation and ASD diagnosis in the corresponding subject subgroup was examined. In the neural network simulation, the more homogeneous the neural excitability of the lower-level network, the more ASD-like the performance (reduced generalization and emotion recognition capability). In addition, in homogeneous networks, the higher the FC, the more ASD-like performance, while in heterogeneous networks, the higher the FC, the less ASD-like performance, demonstrating that FC and neural excitability interact. As an underlying mechanism, neural excitability determines the generalization capability of top-down prediction, and FC determines whether the model's information processing will be top-down prediction-dependent or bottom-up sensory-input dependent. In fMRI datasets, ASD was actually more prevalent in subject subgroups corresponding to the network condition showing ASD-like performance. The current study suggests an interaction between FC and neural excitability, and presents a novel framework for computational modeling and biological application of a developmental learning process underlying cognitive alterations in ASD.

12.
BMC Bioinformatics ; 13: 72, 2012 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-22554139

RESUMO

BACKGROUND: Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs) is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS) however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. RESULTS: We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS) for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units) technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case-control Consortium) data. CONCLUSIONS: Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.


Assuntos
Predisposição Genética para Doença/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Inteligência Artificial , Simulação por Computador , Expressão Gênica , Humanos , Modelos Logísticos , Modelos Genéticos
13.
G3 (Bethesda) ; 11(12)2021 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-34849749

RESUMO

We propose a genetic prediction modeling approach for genome-wide association study (GWAS) data that can include not only marginal gene effects but also gene-environment (GxE) interaction effects-i.e., multiplicative effects of environmental factors with genes rather than merely additive effects of each. The proposed approach is a straightforward extension of our previous multiple regression-based method, STMGP (smooth-threshold multivariate genetic prediction), with the new feature being that genome-wide test statistics from a GxE interaction analysis are used to weight the corresponding variants. We develop a simple univariate regression approximation to the GxE interaction effect that allows a direct fit of the STMGP framework without modification. The sparse nature of our model automatically removes irrelevant predictors (including variants and GxE combinations), and the model is able to simultaneously incorporate multiple environmental variables. Simulation studies to evaluate the proposed method in comparison with other modeling approaches demonstrate its superior performance under the presence of GxE interaction effects. We illustrate the usefulness of our prediction model through application to real GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI).


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Simulação por Computador , Modelos Genéticos , Análise de Regressão
14.
Sci Rep ; 10(1): 21726, 2020 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-33303893

RESUMO

The nature of the recovery process of posttraumatic stress disorder (PTSD) symptoms is multifactorial. The Massive Parallel Limitless-Arity Multiple-testing Procedure (MP-LAMP), which was developed to detect significant combinational risk factors comprehensively, was utilized to reveal hidden combinational risk factors to explain the long-term trajectory of the PTSD symptoms. In 624 population-based subjects severely affected by the Great East Japan Earthquake, 61 potential risk factors encompassing sociodemographics, lifestyle, and traumatic experiences were analyzed by MP-LAMP regarding combinational associations with the trajectory of PTSD symptoms, as evaluated by the Impact of Event Scale-Revised score after eight years adjusted by the baseline score. The comprehensive combinational analysis detected 56 significant combinational risk factors, including 15 independent variables, although the conventional bivariate analysis between single risk factors and the trajectory detected no significant risk factors. The strongest association was observed with the combination of short resting time, short walking time, unemployment, and evacuation without preparation (adjusted P value = 2.2 × 10-4, and raw P value = 3.1 × 10-9). Although short resting time had no association with the poor trajectory, it had a significant interaction with short walking time (P value = 1.2 × 10-3), which was further strengthened by the other two components (P value = 9.7 × 10-5). Likewise, components that were not associated with a poor trajectory in bivariate analysis were included in every observed significant risk combination due to their interactions with other components. Comprehensive combination detection by MP-LAMP is essential for explaining multifactorial psychiatric symptoms by revealing the hidden combinations of risk factors.


Assuntos
Aprendizado de Máquina , Transtornos de Estresse Pós-Traumáticos/diagnóstico , Transtornos de Estresse Pós-Traumáticos/psicologia , Adulto , Idoso , Abrigo de Emergência , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Descanso , Risco , Fatores de Risco , Índice de Gravidade de Doença , Transtornos de Estresse Pós-Traumáticos/fisiopatologia , Desemprego , Caminhada
15.
Transl Psychiatry ; 10(1): 157, 2020 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-32427830

RESUMO

To solve major limitations in algorithms for the metabolite-based prediction of psychiatric phenotypes, a novel prediction model for depressive symptoms based on nonlinear feature selection machine learning, the Hilbert-Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) algorithm, was developed and applied to a metabolomic dataset with the largest sample size to date. In total, 897 population-based subjects were recruited from the communities affected by the Great East Japan Earthquake; 306 metabolite features (37 metabolites identified by nuclear magnetic resonance measurements and 269 characterized metabolites based on the intensities from mass spectrometry) were utilized to build prediction models for depressive symptoms as evaluated by the Center for Epidemiologic Studies-Depression Scale (CES-D). The nested fivefold cross-validation was used for developing and evaluating the prediction models. The HSIC Lasso-based prediction model showed better predictive power than the other prediction models, including Lasso, support vector machine, partial least squares, random forest, and neural network. L-leucine, 3-hydroxyisobutyrate, and gamma-linolenyl carnitine frequently contributed to the prediction. We have demonstrated that the HSIC Lasso-based prediction model integrating nonlinear feature selection showed improved predictive power for depressive symptoms based on metabolome data as well as on risk metabolites based on nonlinear statistics in the Japanese population. Further studies should use HSIC Lasso-based prediction models with different ethnicities to investigate the generality of each risk metabolite for predicting depressive symptoms.


Assuntos
Depressão , Aprendizado de Máquina , Algoritmos , Bases de Dados Factuais , Japão
16.
Transl Psychiatry ; 10(1): 294, 2020 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-32826857

RESUMO

The accuracy of previous genetic studies in predicting polygenic psychiatric phenotypes has been limited mainly due to the limited power in distinguishing truly susceptible variants from null variants and the resulting overfitting. A novel prediction algorithm, Smooth-Threshold Multivariate Genetic Prediction (STMGP), was applied to improve the genome-based prediction of psychiatric phenotypes by decreasing overfitting through selecting variants and building a penalized regression model. Prediction models were trained using a cohort of 3685 subjects in Miyagi prefecture and validated with an independently recruited cohort of 3048 subjects in Iwate prefecture in Japan. Genotyping was performed using HumanOmniExpressExome BeadChip Arrays. We used the target phenotype of depressive symptoms and simulated phenotypes with varying complexity and various effect-size distributions of risk alleles. The prediction accuracy and the degree of overfitting of STMGP were compared with those of state-of-the-art models (polygenic risk scores, genomic best linear-unbiased prediction, summary-data-based best linear-unbiased prediction, BayesR, and ridge regression). In the prediction of depressive symptoms, compared with the other models, STMGP showed the highest prediction accuracy with the lowest degree of overfitting, although there was no significant difference in prediction accuracy. Simulation studies suggested that STMGP has a better prediction accuracy for moderately polygenic phenotypes. Our investigations suggest the potential usefulness of STMGP for predicting polygenic psychiatric conditions while avoiding overfitting.


Assuntos
Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Japão , Aprendizado de Máquina , Modelos Genéticos , Fenótipo
17.
Transl Psychiatry ; 10(1): 290, 2020 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-32807774

RESUMO

Autism spectrum disorder (ASD) has phenotypically and genetically heterogeneous characteristics. A simulation study demonstrated that attempts to categorize patients with a complex disease into more homogeneous subgroups could have more power to elucidate hidden heritability. We conducted cluster analyses using the k-means algorithm with a cluster number of 15 based on phenotypic variables from the Simons Simplex Collection (SSC). As a preliminary study, we conducted a conventional genome-wide association study (GWAS) with a data set of 597 ASD cases and 370 controls. In the second step, we divided cases based on the clustering results and conducted GWAS in each of the subgroups vs controls (cluster-based GWAS). We also conducted cluster-based GWAS on another SSC data set of 712 probands and 354 controls in the replication stage. In the preliminary study, which was conducted in conventional GWAS design, we observed no significant associations. In the second step of cluster-based GWASs, we identified 65 chromosomal loci, which included 30 intragenic loci located in 21 genes and 35 intergenic loci that satisfied the threshold of P < 5.0 × 10-8. Some of these loci were located within or near previously reported candidate genes for ASD: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA, and SRRM4. Of these 65 significant chromosomal loci, rs11064685 located within the SRRM4 gene had a significantly different distribution in the cases vs controls in the replication cohort. These findings suggest that clustering may successfully identify subgroups with relatively homogeneous disease etiologies. Further cluster validation and replication studies are warranted in larger cohorts.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Transtorno do Espectro Autista/genética , Análise por Conglomerados , Fatores de Transcrição Forkhead , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Proteínas do Tecido Nervoso , Fenótipo , Polimorfismo de Nucleotídeo Único
18.
PLoS One ; 14(7): e0219825, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31318927

RESUMO

Gene-environment (GxE) interaction is one potential explanation for the missing heritability problem. A popular approach to genome-wide environment interaction studies (GWEIS) is based on regression models involving interactions between genetic variants and environment variables. Unfortunately, GWEIS encounters systematically inflated (or deflated) test statistics more frequently than a marginal association study. The problematic behavior may occur due to poor specification of the null model (i.e. the model without genetic effect) in GWEIS. Improved null model specification may resolve the problem, but the investigation requires many time-consuming analyses of genome-wide scans, e.g. by trying out several transformations of the phenotype. It is therefore helpful if we can predict such problematic behavior beforehand. We present a simple closed-form formula to assess problematic behavior of GWEIS under the null hypothesis of no genetic effects. It requires only phenotype, environment variables, and covariates, enabling quick identification of systematic test statistic inflation or deflation. Applied to real data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), our formula identified problematic studies from among hundreds GWEIS considering each metabolite as the environment variable in GxE interaction. Our formula is useful to quickly identify problematic GWEIS without requiring a genome-wide scan.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Modelos Genéticos , Modelos Estatísticos , Algoritmos , Doença de Alzheimer/genética , Simulação por Computador , Humanos
19.
Int J Epidemiol ; 48(4): 1305-1315, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30848787

RESUMO

BACKGROUND: Biobanks increasingly collect, process and store omics with more conventional epidemiologic information necessitating considerable effort in data cleaning. An efficient outlier detection method that reduces manual labour is highly desirable. METHOD: We develop an unsupervised machine-learning method for outlier detection, namely kurPCA, that uses principal component analysis combined with kurtosis to ascertain the existence of outliers. In addition, we propose a novel regression adjustment approach to improve detection, namely the regression adjustment for data by systematic missing patterns (RAMP). RESULT: Application to epidemiological record data in a large-scale biobank (Tohoku Medical Megabank Organization, Japan) shows that a combination of kurPCA and RAMP effectively detects known errors or inconsistent patterns. CONCLUSIONS: We confirm through the results of the simulation and the application that our methods showed good performance. The proposed methods are useful for many practical analysis scenarios.


Assuntos
Algoritmos , Aprendizado de Máquina , Modelos Estatísticos , Inquéritos e Questionários , Humanos , Análise de Componente Principal
20.
Nat Commun ; 10(1): 5642, 2019 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-31852890

RESUMO

Deep learning algorithms have been successfully used in medical image classification. In the next stage, the technology of acquiring explainable knowledge from medical images is highly desired. Here we show that deep learning algorithm enables automated acquisition of explainable features from diagnostic annotation-free histopathology images. We compare the prediction accuracy of prostate cancer recurrence using our algorithm-generated features with that of diagnosis by expert pathologists using established criteria on 13,188 whole-mount pathology images consisting of over 86 billion image patches. Our method not only reveals findings established by humans but also features that have not been recognized, showing higher accuracy than human in prognostic prediction. Combining both our algorithm-generated features and human-established criteria predicts the recurrence more accurately than using either method alone. We confirm robustness of our method using external validation datasets including 2276 pathology images. This study opens up fields of machine learning analysis for discovering uncharted knowledge.


Assuntos
Processamento de Imagem Assistida por Computador , Conhecimento , Patologia , Algoritmos , Automação , Compressão de Dados , Humanos , Recidiva Local de Neoplasia/diagnóstico por imagem , Recidiva Local de Neoplasia/patologia , Curva ROC
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa