Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Genomics ; 116(5): 110910, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39111546

RESUMO

This article explores deep learning model design, drawing inspiration from the omnigenic model and genetic heterogeneity concepts, to improve schizophrenia prediction using genotype data. It introduces an innovative three-step approach leveraging neural networks' capabilities to efficiently handle genetic interactions. A locally connected network initially routes input data from variants to their corresponding genes. The second step employs an Encoder-Decoder to capture relationships among identified genes. The final model integrates knowledge from the first two and incorporates a parallel component to consider the effects of additional genes. This expansion enhances prediction scores by considering a larger number of genes. Trained models achieved an average AUC of 0.83, surpassing other genotype-trained models and matching gene expression dataset-based approaches. Additionally, tests on held-out sets reported an average sensitivity of 0.72 and an accuracy of 0.76, aligning with schizophrenia heritability predictions. Moreover, the study addresses genetic heterogeneity challenges by considering diverse population subsets.

2.
Am J Hum Genet ; 106(4): 535-548, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-32243820

RESUMO

The Million Veteran Program (MVP), initiated by the Department of Veterans Affairs (VA), aims to collect biosamples with consent from at least one million veterans. Presently, blood samples have been collected from over 800,000 enrolled participants. The size and diversity of the MVP cohort, as well as the availability of extensive VA electronic health records, make it a promising resource for precision medicine. MVP is conducting array-based genotyping to provide a genome-wide scan of the entire cohort, in parallel with whole-genome sequencing, methylation, and other 'omics assays. Here, we present the design and performance of the MVP 1.0 custom Axiom array, which was designed and developed as a single assay to be used across the multi-ethnic MVP cohort. A unified genetic quality-control analysis was developed and conducted on an initial tranche of 485,856 individuals, leading to a high-quality dataset of 459,777 unique individuals. 668,418 genetic markers passed quality control and showed high-quality genotypes not only on common variants but also on rare variants. We confirmed that, with non-European individuals making up nearly 30%, MVP's substantial ancestral diversity surpasses that of other large biobanks. We also demonstrated the quality of the MVP dataset by replicating established genetic associations with height in European Americans and African Americans ancestries. This current dataset has been made available to approved MVP researchers for genome-wide association studies and other downstream analyses. Further data releases will be available for analysis as recruitment at the VA continues and the cohort expands both in size and diversity.


Assuntos
Etnicidade/genética , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Feminino , Marcadores Genéticos/genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Medicina de Precisão/métodos , Controle de Qualidade , Veteranos , Sequenciamento Completo do Genoma/métodos
3.
J Neurol Neurosurg Psychiatry ; 90(7): 761-767, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30824631

RESUMO

OBJECTIVE: Benign multiple sclerosis (BMS) is often defined by the Expanded Disability Status Scale (EDSS) score of ≤3.0 after ≥15 years of disease duration. This classification's clinical relevance remains unclear as benign patients may suffer other impairments and advance towards a progressive course, prompting our objective to holistically investigate factors associated with BMS and its long-term prognosis. METHODS: Benign cases were identified in the Swedish Multiple Sclerosis registry. Baseline clinical data, demographic features and influence of multiple sclerosis (MS) major risk alleles on likelihood of benign course were investigated. Physical disability (EDSS), cognitive function (Symbol Digit Modalities Test; SDMT) and self-reported and socioeconomic differences between benign and non-benign patients were evaluated using generalised estimation equations models. RESULTS: 11222 patients (2420 benign/8802 non-benign) were included. Benign patients were more likely to be female and younger at MS onset, have fewer relapses within the first two and 5 years from onset and fully recover from the first relapse (p<0.001). No association between human leucocyte antigen (HLA) DRB1*15:01 carriership (OR: 0.97, 95% CI: 0.86 to 1.09) or HLA-A*02:01 lacking (OR: 0.99, 95% CI: 0.87 to 1.11) and benign/non-benign was found. Non-benign patients accumulated an extra 0.04 (95% CI 0.03 to 0.04, p<0.001) EDSS score/year, lost an extra 0.3 (95% CI - 0.39 to - 0.18, p<0.001) SDMT score/year and deteriorated faster in self-reported impact and socioeconomic measures (p<0.001). CONCLUSION: Patients with BMS have a better disease course as they progress more slowly at the group level in all respects. Lack of an association with major genetic risk factors indicates that MS course is most likely influenced by either environmental factor(s) or genetic factors outside the HLA region.


Assuntos
Esclerose Múltipla/patologia , Adulto , Idade de Início , Alelos , Avaliação da Deficiência , Progressão da Doença , Feminino , Humanos , Masculino , Esclerose Múltipla/diagnóstico , Esclerose Múltipla/epidemiologia , Esclerose Múltipla/genética , Prognóstico , Fatores Sexuais , Suécia
4.
BMC Bioinformatics ; 18(1): 173, 2017 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-28302061

RESUMO

BACKGROUND: The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst's toolbox and hence are only rarely applied. RESULTS: We present a modern dimension reduction method called 'Invariant Coordinate Selection' (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods. CONCLUSION: The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline.


Assuntos
Algoritmos , Animais , Galinhas/genética , Análise por Conglomerados , Genótipo , Análise de Componente Principal
5.
Biometrics ; 73(3): 1029-1041, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28182851

RESUMO

We propose a method for visualizing genetic assignment data by characterizing the distribution of genetic profiles for each candidate source population. This method enhances the assignment method of Rannala and Mountain (1997) by calculating appropriate graph positions for individuals for which some genetic data are missing. An individual with missing data is positioned in the distributions of genetic profiles for a population according to its estimated quantile based on its available data. The quantiles of the genetic profile distribution for each population are calculated by approximating the cumulative distribution function (CDF) using the saddlepoint method, and then inverting the CDF to get the quantile function. The saddlepoint method also provides a way to visualize assignment results calculated using the leave-one-out procedure. This new method offers an advance upon assignment software such as geneclass2, which provides no visualization method, and is biologically more interpretable than the bar charts provided by the software structure. We show results from simulated data and apply the methods to microsatellite genotype data from ship rats (Rattus rattus) captured on the Great Barrier Island archipelago, New Zealand. The visualization method makes it straightforward to detect features of population structure and to judge the discriminative power of the genetic data for assigning individuals to source populations.


Assuntos
Software , Animais , Genética Populacional , Genótipo , Repetições de Microssatélites , Nova Zelândia , Ratos
6.
Mol Genet Metab Rep ; 33: 100911, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36092251

RESUMO

Background: Autosomal recessive Gaucher disease (GD) is likely underdiagnosed in many countries. Because the number of diagnosed GD patients in Finland is relatively low, and the true prevalence is currently not known, it was hypothesized that undiagnosed GD patients may exist in Finland. Our previous study demonstrated the applicability of Gaucher Earlier Diagnosis Consensus point-scoring system (GED-C PSS; Mehta et al., 2019) and Finnish biobank data and specimens in the automated point scoring of large populations. An indicative point-score range for Finnish GD patients was determined, but undiagnosed patients were not identified partly due to high number of high-score subjects in combination with a lack of suitable samples for diagnostics in the assessed biobank population. The current study extended the screening to another biobank and evaluated the feasibility of utilising the automated GED-C PSS in conjunction with small nucleotide polymorphism (SNP) chip genotype data from the FinnGen study of biobank sample donors in the identification of undiagnosed GD patients in Finland. Furthermore, the applicability of FFPE tissues and DNA restoration in the next-generation sequencing (NGS) of the GBA gene were tested. Methods: Previously diagnosed Finnish GD patients eligible to the study, and up to 45,100 sample donors in Helsinki Biobank (HBB) were point scored. The GED-C point scoring, adjusted to local data, was automated, but also partly manually verified for GD patients. The SNP chip genotype data for rare GBA variants was visually assessed. FFPE tissues of GD patients were obtained from HBB and Biobank Borealis of Northern Finland (BB). Results: Three previously diagnosed GD patients and one patient previously treated for GD-related features were included. A genetic diagnosis was confirmed for the patient treated for GD-related features. The GED-C point score of the GD patients was 12.5-22.5 in the current study. The score in eight Finnish GD patients of the previous and the current study is thus 6-22.5 points per patient. In the automated point scoring of the HBB subpopulation (N ≈ 45,100), the overall scores ranged from 0 to 17.5, with 0.77% (346/45,100) of the subjects having ≥10 points. The analysis of SNP chip genotype data was able to identify the diagnosed GD patients, but potential undiagnosed patients with the GED-C score and/or the GBA genotype indicative of GD were not discovered. Restoration of the FFPE tissue DNA improved the quality of the GBA NGS, and pathogenic GBA variants were confirmed in five out of six unrestored and in all four restored FFPE DNA samples. Discussion: These findings imply that the prevalence of diagnosed patients (~1:325,000) may indeed correspond the true prevalence of GD in Finland. The SNP chip genotype data is a valuable tool that complements the screening with the GED-C PSS, especially if the genotyping pipeline is tuned for rare variants. These proof-of-concept biobank tools can be adapted to other rare genetic diseases.

7.
Genes (Basel) ; 12(12)2021 11 29.
Artigo em Inglês | MEDLINE | ID: mdl-34946876

RESUMO

An episodic nervous system disorder triggered by strenuous exercise, termed border collie collapse (BCC), exists in border collies and related breeds. The genetic basis of BCC is unknown but is believed to be a complex genetic disorder. Our goal was to estimate the heritability (h2SNP) of BCC, define its underlying genetic architecture, and identify associated genomic loci using dense whole-genome single-nucleotide polymorphism (SNP) genotyping data. Genotype data were obtained for ~440,000 SNPs from 343 border collies (168 BCC cases and 175 controls). h2SNP was calculated to be 49-61% depending on the estimated BCC prevalence. A total of 2407 SNPs across the genome accounted for nearly all the h2SNP of BCC, with an estimated 2003 SNPs of small effect, 349 SNPs of moderate effect, and 56 SNPs of large effect. Genome-wide association analyses identified significantly associated loci on chromosomes 1, 6, 11, 20, and 28, which accounted for ~5% of the total BCC h2SNP. We conclude that BCC is a moderately- to highly-heritable complex polygenetic disease resulting from contributions from hundreds to thousands of genetic variants with variable effect sizes. Understanding how much the BCC phenotype is determined by genetics and whether major gene mutations are likely to exist inform veterinarians and working/stock dog communities of the true nature of this condition.


Assuntos
Doenças do Cão/genética , Padrões de Herança , Doenças do Sistema Nervoso/veterinária , Esforço Físico , Animais , Ataxia/genética , Ataxia/fisiopatologia , Ataxia/veterinária , Doenças do Cão/fisiopatologia , Cães , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/veterinária , Genótipo , Coxeadura Animal/genética , Coxeadura Animal/fisiopatologia , Masculino , Doenças do Sistema Nervoso/genética , Doenças do Sistema Nervoso/fisiopatologia , Polimorfismo de Nucleotídeo Único
8.
BMC Syst Biol ; 11(1): 99, 2017 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-29073909

RESUMO

BACKGROUND: One goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits. To this end, we employed random forests (RFs) and neural networks (NNs) for predictive modeling of coronary artery calcium (CAC), which is an intermediate endo-phenotype of coronary artery disease (CAD). METHODS: Model inputs were derived from advanced cases in the ClinSeq®; discovery cohort (n=16) and the FHS replication cohort (n=36) from 89 th -99 th CAC score percentile range, and age-matched controls (ClinSeq®; n=16, FHS n=36) with no detectable CAC (all subjects were Caucasian males). These inputs included clinical variables and genotypes of 56 single nucleotide polymorphisms (SNPs) ranked highest in terms of their nominal correlation with the advanced CAC state in the discovery cohort. Predictive performance was assessed by computing the areas under receiver operating characteristic curves (ROC-AUC). RESULTS: RF models trained and tested with clinical variables generated ROC-AUC values of 0.69 and 0.61 in the discovery and replication cohorts, respectively. In contrast, in both cohorts, the set of SNPs derived from the discovery cohort were highly predictive (ROC-AUC ≥0.85) with no significant change in predictive performance upon integration of clinical and genotype variables. Using the 21 SNPs that produced optimal predictive performance in both cohorts, we developed NN models trained with ClinSeq®; data and tested with FHS data and obtained high predictive accuracy (ROC-AUC=0.80-0.85) with several topologies. Several CAD and "vascular aging" related biological processes were enriched in the network of genes constructed from the predictive SNPs. CONCLUSIONS: We identified a molecular network predictive of advanced coronary calcium using genotype data from ClinSeq®; and FHS cohorts. Our results illustrate that machine learning tools, which utilize complex interactions between disease predictors intrinsic to the pathogenesis of polygenic disorders, hold promise for deriving predictive disease models and networks.


Assuntos
Cálcio/metabolismo , Biologia Computacional/métodos , Vasos Coronários/metabolismo , Genótipo , Estudos de Coortes , Doença da Artéria Coronariana/epidemiologia , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/metabolismo , Feminino , Humanos , Masculino , Modelos Estatísticos , Redes Neurais de Computação , Fenótipo , Polimorfismo de Nucleotídeo Único
9.
Methods Mol Biol ; 1533: 279-297, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27987178

RESUMO

The goal of Gramene database ( www.gramene.org ) is to empower the plant research community in conducting comparative genomics studies across model plants and crops by employing a phylogenetic framework and orthology-based projections. Gramene database (release #49) provides resources for comparative plant genomics including well-annotated plant genomes (39 complete reference genomes and six partial genomes), genetic or structural variation data for 14 plant species, pathways for 58 plant species, and gene expression data for 14 species including Arabidopsis, rice, maize, soybean, wheat, etc. (fetched from EBI-EMBL Gene Expression Atlas database). Gramene also facilitates visualization and analysis of user-defined data in the context of species-specific Genome Browsers or pathways. This chapter describes basic navigation for Gramene users and illustrates how they can use the genome section to analyze the gene expression and nucleotide variation data generated in their labs. This includes (1) upload and display of genomic data onto a Genome Browser track, (2) analysis of variation data using online Variant Effect Predictor (VEP) tool for smaller data sets, and (3) the use of the stand-alone Perl scripts and command line protocols for variant effect prediction on larger data sets.


Assuntos
Biologia Computacional/métodos , Produtos Agrícolas/genética , Bases de Dados Genéticas , Genômica , Plantas/genética , Navegador , Variação Genética , Genômica/métodos , Software , Interface Usuário-Computador
10.
Front Genet ; 5: 214, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25071839

RESUMO

Obesity is a complex condition with world-wide exponentially rising prevalence rates, linked with severe diseases like Type 2 Diabetes. Economic and welfare consequences have led to a raised interest in a better understanding of the biological and genetic background. To date, whole genome investigations focusing on single genetic variants have achieved limited success, and the importance of including genetic interactions is becoming evident. Here, the aim was to perform an integrative genomic analysis in an F2 pig resource population that was constructed with an aim to maximize genetic variation of obesity-related phenotypes and genotyped using the 60K SNP chip. Firstly, Genome Wide Association (GWA) analysis was performed on the Obesity Index to locate candidate genomic regions that were further validated using combined Linkage Disequilibrium Linkage Analysis and investigated by evaluation of haplotype blocks. We built Weighted Interaction SNP Hub (WISH) and differentially wired (DW) networks using genotypic correlations amongst obesity-associated SNPs resulting from GWA analysis. GWA results and SNP modules detected by WISH and DW analyses were further investigated by functional enrichment analyses. The functional annotation of SNPs revealed several genes associated with obesity, e.g., NPC2 and OR4D10. Moreover, gene enrichment analyses identified several significantly associated pathways, over and above the GWA study results, that may influence obesity and obesity related diseases, e.g., metabolic processes. WISH networks based on genotypic correlations allowed further identification of various gene ontology terms and pathways related to obesity and related traits, which were not identified by the GWA study. In conclusion, this is the first study to develop a (genetic) obesity index and employ systems genetics in a porcine model to provide important insights into the complex genetic architecture associated with obesity and many biological pathways that underlie it.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa