Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 411, 2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37907836

RESUMO

BACKGROUND: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. RESULTS: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. CONCLUSIONS: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.


Assuntos
Polimorfismo de Nucleotídeo Único , Humanos , Marcadores Genéticos , Desequilíbrio de Ligação , Fenótipo , Análise por Conglomerados
2.
Neuroimage ; 284: 120466, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-37995919

RESUMO

Alterations in subcortical brain structure volumes have been found to be associated with several neurodegenerative and psychiatric disorders. At the same time, genome-wide association studies (GWAS) have identified numerous common variants associated with brain structure. In this study, we integrate these findings, aiming to identify proteins, metabolites, or microbes that have a putative causal association with subcortical brain structure volumes via a two-sample Mendelian randomization approach. This method uses genetic variants as instrument variables to identify potentially causal associations between an exposure and an outcome. The exposure data that we analyzed comprised genetic associations for 2994 plasma proteins, 237 metabolites, and 103 microbial genera. The outcome data included GWAS data for seven subcortical brain structure volumes including accumbens, amygdala, caudate, hippocampus, pallidum, putamen, and thalamus. Eleven proteins and six metabolites were found to have a significant association with subcortical structure volumes, with nine proteins and five metabolites replicated using independent exposure data. We found causal associations between accumbens volume and plasma protease c1 inhibitor as well as strong association between putamen volume and Agouti signaling protein. Among metabolites, urate had the strongest association with thalamic volume. No significant associations were detected between the microbial genera and subcortical brain structure volumes. We also observed significant enrichment for biological processes such as proteolysis, regulation of the endoplasmic reticulum apoptotic signaling pathway, and negative regulation of DNA binding. Our findings provide insights to the mechanisms through which brain volumes may be affected in the pathogenesis of neurodevelopmental and psychiatric disorders and point to potential treatment targets for disorders that are associated with subcortical brain structure volumes.


Assuntos
Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Humanos , Estudo de Associação Genômica Ampla/métodos , Multiômica , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Biomarcadores , Imageamento por Ressonância Magnética/métodos
3.
Mol Biol Evol ; 38(5): 1809-1819, 2021 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-33481022

RESUMO

India represents an intricate tapestry of population substructure shaped by geography, language, culture, and social stratification. Although geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a data set of 891 individuals from 90 well-defined groups. Bringing together geography, genetics, and demographic factors, we developed Correlation Optimization of Genetics and Geodemographics to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure using a ridge leverage score statistic. Integrating data from India with a data set of additional 1,323 individuals from 50 Eurasian populations, we find that Indo-European and Dravidian speakers of India show shared genetic drift with Europeans, whereas the Tibeto-Burman speaking tribal groups have maximum shared genetic drift with East Asians.


Assuntos
Etnicidade/genética , Variação Genética , Idioma , Modelos Genéticos , Fatores Sociológicos , Geografia , Humanos , Índia
4.
Bioinformatics ; 35(19): 3679-3683, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30957838

RESUMO

MOTIVATION: Principal Component Analysis is a key tool in the study of population structure in human genetics. As modern datasets become increasingly larger in size, traditional approaches based on loading the entire dataset in the system memory (Random Access Memory) become impractical and out-of-core implementations are the only viable alternative. RESULTS: We present TeraPCA, a C++ implementation of the Randomized Subspace Iteration method to perform Principal Component Analysis of large-scale datasets. TeraPCA can be applied both in-core and out-of-core and is able to successfully operate even on commodity hardware with a system memory of just a few gigabytes. Moreover, TeraPCA has minimal dependencies on external libraries and only requires a working installation of the BLAS and LAPACK libraries. When applied to a dataset containing a million individuals genotyped on a million markers, TeraPCA requires <5 h (in multi-threaded mode) to accurately compute the 10 leading principal components. An extensive experimental analysis shows that TeraPCA is both fast and accurate and is competitive with current state-of-the-art software for the same task. AVAILABILITY AND IMPLEMENTATION: Source code and documentation are both available at https://github.com/aritra90/TeraPCA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Variação Genética , Genótipo , Humanos , Análise de Componente Principal
5.
IEEE Trans Inf Theory ; 66(8): 5003-5021, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33746243

RESUMO

The von Neumann entropy, named after John von Neumann, is an extension of the classical concept of entropy to the field of quantum mechanics. From a numerical perspective, von Neumann entropy can be computed simply by computing all eigenvalues of a density matrix, an operation that could be prohibitively expensive for large-scale density matrices. We present and analyze three randomized algorithms to approximate von Neumann entropy of real density matrices: our algorithms leverage recent developments in the Randomized Numerical Linear Algebra (RandNLA) literature, such as randomized trace estimators, provable bounds for the power method, and the use of random projections to approximate the eigenvalues of a matrix. All three algorithms come with provable accuracy guarantees and our experimental evaluations support our theoretical findings showing considerable speedup with small loss in accuracy.

6.
Ann Hum Genet ; 83(6): 373-388, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31192450

RESUMO

The medieval history of several populations often suffers from scarcity of contemporary records resulting in contradictory and sometimes biased interpretations by historians. This is the situation with the population of the island of Crete, which remained relatively undisturbed until the Middle Ages when multiple wars, invasions, and occupations by foreigners took place. Historians have considered the effects of the occupation of Crete by the Arabs (in the 9th and 10th centuries C.E.) and the Venetians (in the 13th to the 17th centuries C.E.) to the local population. To obtain insights on such effects from a genetic perspective, we studied representative samples from 17 Cretan districts using the Illumina 1 million or 2.5 million arrays and compared the Cretans to the populations of origin of the medieval conquerors and settlers. Highlights of our findings include (1) small genetic contributions from the Arab occupation to the extant Cretan population, (2) low genetic contribution of the Venetians to the extant Cretan population, and (3) evidence of a genetic relationship among the Cretans and Central, Northern, and Eastern Europeans, which could be explained by the settlement in the island of northern origin tribes during the medieval period. Our results show how the interaction between genetics and the historical record can help shed light on the historical record.


Assuntos
Genética Populacional , População Branca/genética , Cruzamentos Genéticos , Bases de Dados Genéticas , Etnicidade/genética , Variação Genética , Genética Populacional/história , Genoma Humano , Genômica/métodos , Genótipo , Geografia , Grécia , História Medieval , Migração Humana , Humanos , População Branca/história
7.
BMC Bioinformatics ; 18(1): 341, 2017 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-28716001

RESUMO

BACKGROUND: The increasing volume and complexity of high-throughput genomic data make analysis and prioritization of variants difficult for researchers with limited bioinformatics skills. Variant Ranker allows researchers to rank identified variants and determine the most confident variants for experimental validation. RESULTS: We describe Variant Ranker, a user-friendly simple web-based tool for ranking, filtering and annotation of coding and non-coding variants. Variant Ranker facilitates the identification of causal variants based on novelty, effect and annotation information. The algorithm implements and aggregates multiple prediction algorithm scores, conservation scores, allelic frequencies, clinical information and additional open-source annotations using accessible databases via ANNOVAR. The available information for a variant is transformed into user-specified weights, which are in turn encoded into the ranking algorithm. Through its different modules, users can (i) rank a list of variants (ii) perform genotype filtering for case-control samples (iii) filter large amounts of high-throughput data based on user custom filter requirements and apply different models of inheritance (iv) perform downstream functional enrichment analysis through network visualization. Using networks, users can identify clusters of genes that belong to multiple ontology categories (like pathways, gene ontology, disease categories) and therefore expedite scientific discoveries. We demonstrate the utility of Variant Ranker to identify causal genes using real and synthetic datasets. Our results indicate that Variant Ranker exhibits excellent performance by correctly identifying and ranking the candidate genes CONCLUSIONS: Variant Ranker is a freely available web server on http://paschou-lab.mbg.duth.gr/Software.html . This tool will enable users to prioritise potentially causal variants and is applicable to a wide range of sequencing data.


Assuntos
Variação Genética , Genômica/métodos , Software , Algoritmos , Frequência do Gene , Ontologia Genética , Genótipo , Humanos , Internet , Análise de Sequência de DNA
8.
Proc Natl Acad Sci U S A ; 111(25): 9211-6, 2014 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-24927591

RESUMO

The Neolithic populations, which colonized Europe approximately 9,000 y ago, presumably migrated from Near East to Anatolia and from there to Central Europe through Thrace and the Balkans. An alternative route would have been island hopping across the Southern European coast. To test this hypothesis, we analyzed genome-wide DNA polymorphisms on populations bordering the Mediterranean coast and from Anatolia and mainland Europe. We observe a striking structure correlating genes with geography around the Mediterranean Sea with characteristic east to west clines of gene flow. Using population network analysis, we also find that the gene flow from Anatolia to Europe was through Dodecanese, Crete, and the Southern European coast, compatible with the hypothesis that a maritime coastal route was mainly used for the migration of Neolithic farmers to Europe.


Assuntos
Fluxo Gênico , Estudo de Associação Genômica Ampla , Polimorfismo Genético , Emigração e Imigração/história , Feminino , Genética Médica , História Antiga , Humanos , Masculino , Região do Mediterrâneo
9.
Neural Comput ; 28(4): 716-42, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26890353

RESUMO

We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.

10.
medRxiv ; 2023 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-37066330

RESUMO

Alterations in subcortical brain structure volumes have been found to be associated with several neurodegenerative and psychiatric disorders. At the same time, genome-wide association studies (GWAS) have identified numerous common variants associated with brain structure. In this study, we integrate these findings, aiming to identify proteins, metabolites, or microbes that have a putative causal association with subcortical brain structure volumes via a two-sample Mendelian randomization approach. This method uses genetic variants as instrument variables to identify potentially causal associations between an exposure and an outcome. The exposure data that we analyzed comprised genetic associations for 2,994 plasma proteins, 237 metabolites, and 103 microbial genera. The outcome data included GWAS data for seven subcortical brain structure volumes including accumbens, amygdala, caudate, hippocampus, pallidum, putamen, and thalamus. Eleven proteins and six metabolites were found to have a significant association with subcortical structure volumes. We found causal associations between amygdala volume and granzyme A as well as association between accumbens volume and plasma protease c1 inhibitor. Among metabolites, urate had the strongest association with thalamic volume. No significant associations were detected between the microbial genera and subcortical brain structure volumes. We also observed significant enrichment for biological processes such as proteolysis, regulation of the endoplasmic reticulum apoptotic signaling pathway, and negative regulation of DNA binding. Our findings provide insights to the mechanisms through which brain volumes may be affected in the pathogenesis of neurodevelopmental and psychiatric disorders and point to potential treatment targets for disorders that are associated with subcortical brain structure volumes.

11.
BMC Genom Data ; 24(1): 70, 2023 11 20.
Artigo em Inglês | MEDLINE | ID: mdl-37986041

RESUMO

Complex disorders are caused by a combination of genetic, environmental and lifestyle factors, and their prevalence can vary greatly across different populations. The extent to which genetic risk, as identified by Genome Wide Association Study (GWAS), correlates to disease prevalence in different populations has not been investigated systematically. Here, we studied 14 different complex disorders and explored whether polygenic risk scores (PRS) based on current GWAS correlate to disease prevalence within Europe and around the world. A clear variation in GWAS-based genetic risk was observed based on ancestry and we identified populations that have a higher genetic liability for developing certain disorders. We found that for four out of the 14 studied disorders, PRS significantly correlates to disease prevalence within Europe. We also found significant correlations between worldwide disease prevalence and PRS for eight of the studied disorders with Multiple Sclerosis genetic risk having the highest correlation to disease prevalence. Based on current GWAS results, the across population differences in genetic risk for certain disorders can potentially be used to understand differences in disease prevalence and identify populations with the highest genetic liability. The study highlights both the limitations of PRS based on current GWAS but also the fact that in some cases, PRS may already have high predictive power. This could be due to the genetic architecture of specific disorders or increased GWAS power in some cases.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Prevalência , Fatores de Risco , Herança Multifatorial/genética
12.
Front Immunol ; 14: 1147573, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37809097

RESUMO

Introduction: Autoimmune disorders (ADs) are a group of about 80 disorders that occur when self-attacking autoantibodies are produced due to failure in the self-tolerance mechanisms. ADs are polygenic disorders and associations with genes both in the human leukocyte antigen (HLA) region and outside of it have been described. Previous studies have shown that they are highly comorbid with shared genetic risk factors, while epidemiological studies revealed associations between various lifestyle and health-related phenotypes and ADs. Methods: Here, for the first time, we performed a comparative polygenic risk score (PRS) - Phenome Wide Association Study (PheWAS) for 11 different ADs (Juvenile Idiopathic Arthritis, Primary Sclerosing Cholangitis, Celiac Disease, Multiple Sclerosis, Rheumatoid Arthritis, Psoriasis, Myasthenia Gravis, Type 1 Diabetes, Systemic Lupus Erythematosus, Vitiligo Late Onset, Vitiligo Early Onset) and 3,254 phenotypes available in the UK Biobank that include a wide range of socio-demographic, lifestyle and health-related outcomes. Additionally, we investigated the genetic relationships of the studied ADs, calculating their genetic correlation and conducting cross-disorder GWAS meta-analyses for the observed AD clusters. Results: In total, we identified 508 phenotypes significantly associated with at least one AD PRS. 272 phenotypes were significantly associated after excluding variants in the HLA region from the PRS estimation. Through genetic correlation and genetic factor analyses, we identified four genetic factors that run across studied ADs. Cross-trait meta-analyses within each factor revealed pleiotropic genome-wide significant loci. Discussion: Overall, our study confirms the association of different factors with genetic susceptibility for ADs and reveals novel observations that need to be further explored.


Assuntos
Doenças Autoimunes , Diabetes Mellitus Tipo 1 , Vitiligo , Humanos , Doenças Autoimunes/genética , Diabetes Mellitus Tipo 1/genética , Antígenos HLA , Fenótipo , Polimorfismo de Nucleotídeo Único
13.
Transl Psychiatry ; 13(1): 69, 2023 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-36823209

RESUMO

Tourette Syndrome (TS) is a complex neurodevelopmental disorder characterized by vocal and motor tics lasting more than a year. It is highly polygenic in nature with both rare and common previously associated variants. Epidemiological studies have shown TS to be correlated with other phenotypes, but large-scale phenome wide analyses in biobank level data have not been performed to date. In this study, we used the summary statistics from the latest meta-analysis of TS to calculate the polygenic risk score (PRS) of individuals in the UK Biobank data and applied a Phenome Wide Association Study (PheWAS) approach to determine the association of disease risk with a wide range of phenotypes. A total of 57 traits were found to be significantly associated with TS polygenic risk, including multiple psychosocial factors and mental health conditions such as anxiety disorder and depression. Additional associations were observed with complex non-psychiatric disorders such as Type 2 diabetes, heart palpitations, and respiratory conditions. Cross-disorder comparisons of phenotypic associations with genetic risk for other childhood-onset disorders (e.g.: attention deficit hyperactivity disorder [ADHD], autism spectrum disorder [ASD], and obsessive-compulsive disorder [OCD]) indicated an overlap in associations between TS and these disorders. ADHD and ASD had a similar direction of effect with TS while OCD had an opposite direction of effect for all traits except mental health factors. Sex-specific PheWAS analysis identified differences in the associations with TS genetic risk between males and females. Type 2 diabetes and heart palpitations were significantly associated with TS risk in males but not in females, whereas diseases of the respiratory system were associated with TS risk in females but not in males. This analysis provides further evidence of shared genetic and phenotypic architecture of different complex disorders.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Transtorno do Espectro Autista , Diabetes Mellitus Tipo 2 , Síndrome de Tourette , Masculino , Feminino , Humanos , Síndrome de Tourette/genética , Transtorno do Espectro Autista/genética , Transtorno do Deficit de Atenção com Hiperatividade/genética , Fatores de Risco
14.
Ann Hum Genet ; 76(6): 472-83, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23061745

RESUMO

Studies of the genomic structure of the Greek population and Southeastern Europe are limited, despite the central position of the area as a gateway for human migrations into Europe. HapMap has provided a unique tool for the analysis of human genetic variation. Europe is represented by the CEU (Northwestern Europe) and the TSI populations (Tuscan Italians from Southern Europe), which serve as reference for the design of genetic association studies. Furthermore, genetic association findings are often transferred to unstudied populations. Although initial studies support the fact that the CEU can, in general, be used as reference for the selection of tagging SNPs in European populations, this has not been extensively studied across Europe. We set out to explore the genomic structure of the Greek population (56 individuals) and compare it to the HapMap TSI and CEU populations. We studied 1112 SNPs (27 regions, 13 chromosomes). Although the HapMap European populations are, in general, a good reference for the Greek population, regions of population differentiation do exist and results should not be light-heartedly generalized. We conclude that, perhaps due to the individual evolutionary history of each genomic region, geographic proximity is not always a perfect guide for selecting a reference population for an unstudied population.


Assuntos
Genômica , Projeto HapMap , População Branca/genética , Alelos , Etnicidade/genética , Frequência do Gene , Estudo de Associação Genômica Ampla , Grécia/etnologia , Humanos , Polimorfismo de Nucleotídeo Único
15.
Proc Natl Acad Sci U S A ; 106(3): 697-702, 2009 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-19139392

RESUMO

Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high "statistical leverage" and, thus, in a very precise statistical sense, exert a disproportionately large "influence" on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.

16.
Sci Rep ; 12(1): 8242, 2022 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-35581276

RESUMO

The emergence of genome-wide association studies (GWAS) has led to the creation of large repositories of human genetic variation, creating enormous opportunities for genetic research and worldwide collaboration. Methods that are based on GWAS summary statistics seek to leverage such records, overcoming barriers that often exist in individual-level data access while also offering significant computational savings. Such summary-statistics-based applications include GWAS meta-analysis, with and without sample overlap, and case-case GWAS. We compare performance of leading methods for summary-statistics-based genomic analysis and also introduce a novel framework that can unify usual summary-statistics-based implementations via the reconstruction of allelic and genotypic frequencies and counts (ReACt). First, we evaluate ASSET, METAL, and ReACt using both synthetic and real data for GWAS meta-analysis (with and without sample overlap) and find that, while all three methods are comparable in terms of power and error control, ReACt and METAL are faster than ASSET by a factor of at least hundred. We then proceed to evaluate performance of ReACt vs an existing method for case-case GWAS and show comparable performance, with ReACt requiring minimal underlying assumptions and being more user-friendly. Finally, ReACt allows us to evaluate, for the first time, an implementation for calculating polygenic risk score (PRS) for groups of cases and controls based on summary statistics. Our work demonstrates the power of GWAS summary-statistics-based methodologies and the proposed novel method provides a unifying framework and allows further extension of possibilities for researchers seeking to understand the genetics of complex disease.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Alelos , Genótipo , Humanos , Fenótipo
17.
Res Comput Mol Biol ; 13278: 86-106, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-36649383

RESUMO

Principal component analysis (PCA) is a widely used dimensionality reduction technique in machine learning and multivariate statistics. To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA). In this paper, we present ThreSPCA, a provably accurate algorithm based on thresholding the Singular Value Decomposition for the SPCA problem, without imposing any restrictive assumptions on the input covariance matrix. Our thresholding algorithm is conceptually simple; much faster than current state-of-the-art; and performs well in practice. When applied to genotype data from the 1000 Genomes Project, ThreSPCA is faster than previous benchmarks, at least as accurate, and leads to a set of interpretable biomarkers, revealing genetic diversity across the world.

18.
Front Psychiatry ; 13: 958688, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36072455

RESUMO

Tourette syndrome (TS) is characterized by multiple motor and vocal tics, and high-comorbidity rates with other neuropsychiatric disorders. Obsessive compulsive disorder (OCD), attention deficit hyperactivity disorder (ADHD), autism spectrum disorders (ASDs), major depressive disorder (MDD), and anxiety disorders (AXDs) are among the most prevalent TS comorbidities. To date, studies on TS brain structure and function have been limited in size with efforts mostly fragmented. This leads to low-statistical power, discordant results due to differences in approaches, and hinders the ability to stratify patients according to clinical parameters and investigate comorbidity patterns. Here, we present the scientific premise, perspectives, and key goals that have motivated the establishment of the Enhancing Neuroimaging Genetics through Meta-Analysis for TS (ENIGMA-TS) working group. The ENIGMA-TS working group is an international collaborative effort bringing together a large network of investigators who aim to understand brain structure and function in TS and dissect the underlying neurobiology that leads to observed comorbidity patterns and clinical heterogeneity. Previously collected TS neuroimaging data will be analyzed jointly and integrated with TS genomic data, as well as equivalently large and already existing studies of highly comorbid OCD, ADHD, ASD, MDD, and AXD. Our work highlights the power of collaborative efforts and transdiagnostic approaches, and points to the existence of different TS subtypes. ENIGMA-TS will offer large-scale, high-powered studies that will lead to important insights toward understanding brain structure and function and genetic effects in TS and related disorders, and the identification of biomarkers that could help inform improved clinical practice.

19.
Ann Hum Genet ; 75(6): 707-22, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21902678

RESUMO

The linkage disequilibrium structure of the human genome allows identification of small sets of single nucleotide polymorphisms (SNPs) (tSNPs) that efficiently represent dense sets of markers. This structure can be translated into linear algebraic terms as evidenced by the well documented principal components analysis (PCA)-based methods. Here we apply, for the first time, PCA-based methodology for efficient genomewide tSNP selection; and explore the linear algebraic structure of the human genome. Our algorithm divides the genome into contiguous nonoverlapping windows of high linear structure. Coupling this novel window definition with a PCA-based tSNP selection method, we analyze 2.5 million SNPs from the HapMap phase 2 dataset. We show that 10-25% of these SNPs suffice to predict the remaining genotypes with over 95% accuracy. A comparison with other popular methods in the ENCODE regions indicates significant genotyping savings. We evaluate the portability of genome-wide tSNPs across a diverse set of populations (HapMap phase 3 dataset). Interestingly, African populations are good reference populations for the rest of the world. Finally, we demonstrate the applicability of our approach in a real genome-wide disease association study. The chosen tSNP panels can be used toward genotype imputation using either a simple regression-based algorithm or more sophisticated genotype imputation methods.


Assuntos
Genótipo , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Algoritmos , População Negra , Genoma Humano , Estudo de Associação Genômica Ampla , Projeto HapMap , Humanos
20.
J Med Genet ; 47(12): 835-47, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20921023

RESUMO

BACKGROUND AND AIMS: The analysis of large-scale genetic data from thousands of individuals has revealed the fact that subtle population genetic structure can be detected at levels that were previously unimaginable. Using the Human Genome Diversity Panel as reference (51 populations - 650,000 SNPs), this works describes a systematic evaluation of the resolution that can be achieved for the inference of genetic ancestry, even when small panels of genetic markers are used. METHODS AND RESULTS: A comprehensive investigation of human population structure around the world is undertaken by leveraging the power of Principal Components Analysis (PCA). The problem is dissected into hierarchical steps and a decision tree for the prediction of individual ancestry is proposed. A complete leave-one-out validation experiment demonstrates that, using all available SNPs, assignment of individuals to their self-reported populations of origin is essentially perfect. Ancestry informative genetic markers are selected using two different metrics (In and correlation with PCA scores). A thorough cross-validation experiment indicates that, in most cases here, the number of SNPs needed for ancestry inference can be successfully reduced to less than 0.1% of the original 650,000 while retaining close to 100% accuracy. This reduction can be achieved using a novel clustering-based redundancy removal algorithm that is also introduced here. Finally, the applicability of our suggested SNP panels is tested on HapMap Phase 3 populations. CONCLUSION: The proposed methods and ancestry informative marker panels, in combination with the increasingly more comprehensive databases of human genetic variation, open new horizons in a variety of fields, ranging from the study of human evolution and population history, to medical genetics and forensics.


Assuntos
Genealogia e Heráldica , Genética Populacional , Internacionalidade , Bases de Dados Genéticas , Árvores de Decisões , Marcadores Genéticos , Variação Genética , Genoma Humano/genética , Haplótipos/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Componente Principal , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA