Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
2.
Sci Rep ; 13(1): 11662, 2023 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-37468507

RESUMO

In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of [Formula: see text] and for height a correlation of [Formula: see text] for a Taiwanese population. This is above the measured values of [Formula: see text] and [Formula: see text], respectively, for UK Biobank trained predictors applied to a European population.


Assuntos
Asma , Bancos de Espécimes Biológicos , Humanos , Aprendizado de Máquina , Previsões , Algoritmos
3.
Sci Rep ; 13(1): 376, 2023 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-36611071

RESUMO

We use UK Biobank and a unique IVF family dataset (including genotyped embryos) to investigate sibling variation in both phenotype and genotype. We compare phenotype (disease status, height, blood biomarkers) and genotype (polygenic scores, polygenic health index) distributions among siblings to those in the general population. As expected, the between-siblings standard deviation in polygenic scores is [Formula: see text] times smaller than in the general population, but variation is still significant. As previously demonstrated, this allows for substantial benefit from polygenic screening in IVF. Differences in sibling genotypes result from distinct recombination patterns in sexual reproduction. We develop a novel sibling-pair method for detection of recombination breaks via statistical discontinuities. The new method is used to construct a dataset of 1.44 million recombination events which may be useful in further study of meiosis.


Assuntos
Herança Multifatorial , Irmãos , Humanos , Herança Multifatorial/genética , Bancos de Espécimes Biológicos , Genótipo , Fenótipo , Recombinação Genética , Reino Unido/epidemiologia , DNA , Fertilização in vitro , Estudo de Associação Genômica Ampla
5.
Sci Rep ; 12(1): 18173, 2022 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-36307513

RESUMO

We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among ten individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions.


Assuntos
Diabetes Mellitus Tipo 1 , Diabetes Mellitus Tipo 2 , Humanos , Irmãos , Herança Multifatorial , Expectativa de Vida , Comportamento de Redução do Risco , Fatores de Risco
6.
Methods Mol Biol ; 2467: 421-446, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35451785

RESUMO

Decoding the genome confers the capability to predict characteristics of the organism (phenotype) from DNA (genotype). We describe the present status and future prospects of genomic prediction of complex traits in humans. Some highly heritable complex phenotypes such as height and other quantitative traits can already be predicted with reasonable accuracy from DNA alone. For many diseases, including important common conditions such as coronary artery disease, breast cancer, type I and II diabetes, individuals with outlier polygenic scores (e.g., top few percent) have been shown to have 5 or even 10 times higher risk than average. Several psychiatric conditions such as schizophrenia and autism also fall into this category. We discuss related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Genômica , Genótipo , Humanos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único
7.
Genes (Basel) ; 12(8)2021 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-34440279

RESUMO

Machine learning methods applied to large genomic datasets (such as those used in GWAS) have led to the creation of polygenic risk scores (PRSs) that can be used identify individuals who are at highly elevated risk for important disease conditions, such as coronary artery disease (CAD), diabetes, hypertension, breast cancer, and many more. PRSs have been validated in large population groups across multiple continents and are under evaluation for widespread clinical use in adult health. It has been shown that PRSs can be used to identify which of two individuals is at a lower disease risk, even when these two individuals are siblings from a shared family environment. The relative risk reduction (RRR) from choosing an embryo with a lower PRS (with respect to one chosen at random) can be quantified by using these sibling results. New technology for precise embryo genotyping allows more sophisticated preimplantation ranking with better results than the current method of selection that is based on morphology. We review the advances described above and discuss related ethical considerations.


Assuntos
Embrião de Mamíferos , Predisposição Genética para Doença , Testes Genéticos/ética , Testes Genéticos/métodos , Herança Multifatorial , Humanos
8.
Genes (Basel) ; 12(7)2021 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-34209487

RESUMO

We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.


Assuntos
Aterosclerose/epidemiologia , Biomarcadores/sangue , Biomarcadores/urina , Doenças Cardiovasculares/epidemiologia , Lipoproteína(a)/sangue , Adulto , Aterosclerose/sangue , Aterosclerose/urina , Bancos de Espécimes Biológicos , Cálcio/sangue , Cálcio/urina , Doenças Cardiovasculares/sangue , Feminino , Fatores de Risco de Doenças Cardíacas , Hemoglobinas/genética , Humanos , Lipoproteínas HDL/sangue , Lipoproteínas LDL/sangue , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Herança Multifatorial/genética , Medição de Risco , Reino Unido/epidemiologia , Estados Unidos/epidemiologia
9.
Sci Rep ; 10(1): 13190, 2020 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-32764582

RESUMO

We test 26 polygenic predictors using tens of thousands of genetic siblings from the UK Biobank (UKB), for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have typically experienced similar environments during childhood, and exhibit negligible population stratification relative to each other. Therefore, the ability to predict differences in disease risk or complex trait values between siblings is a strong test of genomic prediction in humans. We compare validation results obtained using non-sibling subjects to those obtained among siblings and find that typically most of the predictive power persists in between-sibling designs. In the case of disease risk we test the extent to which higher polygenic risk score (PRS) identifies the affected sibling, and also compute Relative Risk Reduction as a function of risk score threshold. For quantitative traits we examine between-sibling differences in trait values as a function of predicted differences, and compare to performance in non-sibling pairs. Example results: Given 1 sibling with normal-range PRS score (< 84 percentile, < + 1 SD) and 1 sibling with high PRS score (top few percentiles, i.e. > + 2 SD), the predictors identify the affected sibling about 70-90% of the time across a variety of disease conditions, including Breast Cancer, Heart Attack, Diabetes, etc. 55-65% of the time the higher PRS sibling is the case. For quantitative traits such as height, the predictor correctly identifies the taller sibling roughly 80 percent of the time when the (male) height difference is 2 inches or more.


Assuntos
Biologia Computacional , Doença/genética , Predisposição Genética para Doença/genética , Fenótipo , Irmãos , Bancos de Espécimes Biológicos , Feminino , Humanos , Masculino , Polimorfismo de Nucleotídeo Único
10.
Sci Rep ; 10(1): 12055, 2020 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-32694572

RESUMO

Genomic prediction of complex human traits (e.g., height, cognitive ability, bone density) and disease risks (e.g., breast cancer, diabetes, heart disease, atrial fibrillation) has advanced considerably in recent years. Using data from the UK Biobank, predictors have been constructed using penalized algorithms that favor sparsity: i.e., which use as few genetic variants as possible. We analyze the specific genetic variants (SNPs) utilized in these predictors, which can vary from dozens to as many as thirty thousand. We find that the fraction of SNPs in or near genic regions varies widely by phenotype. For the majority of disease conditions studied, a large amount of the variance is accounted for by SNPs outside of coding regions. The state of these SNPs cannot be determined from exome-sequencing data. This suggests that exome data alone will miss much of the heritability for these traits-i.e., existing PRS cannot be computed from exome data alone. We also study the fraction of SNPs and of variance that is in common between pairs of predictors. The DNA regions used in disease risk predictors so far constructed seem to be largely disjoint (with a few interesting exceptions), suggesting that individual genetic disease risks are largely uncorrelated. It seems possible in theory for an individual to be a low-risk outlier in all conditions simultaneously.


Assuntos
Estudos de Associação Genética , Predisposição Genética para Doença , Modelos Genéticos , Herança Multifatorial , Característica Quantitativa Herdável , Algoritmos , Análise por Conglomerados , Humanos , Polimorfismo de Nucleotídeo Único , Sequenciamento do Exoma
11.
Genes (Basel) ; 11(6)2020 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-32545548

RESUMO

Preimplantation genetic testing for polygenic disease risk (PGT-P) represents a new tool to aid in embryo selection. Previous studies demonstrated the ability to obtain necessary genotypes in the embryo with accuracy equivalent to in adults. When applied to select adult siblings with known type I diabetes status, a reduction in disease incidence of 45-72% compared to random selection was achieved. This study extends analysis to 11,883 sibling pairs to evaluate clinical utility of embryo selection with PGT-P. Results demonstrate simultaneous relative risk reduction of all diseases tested in parallel, which included diabetes, cancer, and heart disease, and indicate applicability beyond patients with a known family history of disease.


Assuntos
Diabetes Mellitus Tipo 1/diagnóstico , Doenças Genéticas Inatas/diagnóstico , Herança Multifatorial/genética , Diagnóstico Pré-Implantação/métodos , Adulto , Pré-Escolar , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 1/patologia , Feminino , Doenças Genéticas Inatas/genética , Doenças Genéticas Inatas/patologia , Humanos , Masculino , Pessoa de Meia-Idade , Linhagem , Fatores de Risco , Irmãos
12.
Reproduction ; 160(5): A13-A17, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32413844

RESUMO

Since its introduction to clinical practice, preimplantation genetic testing (PGT) has become a standard of care for couples at risk of having children with monogenic disease and for chromosomal aneuploidy to improve outcomes for patients with infertility. The primary objective of PGT is to reduce the risk of miscarriage and genetic disease and to improve the success of infertility treatment with the delivery of a healthy child. Until recently, the application of PGT to more common but complex polygenic disease was not possible, as the genetic contribution to polygenic disease has been difficult to determine, and the concept of embryo selection across multiple genetic loci has been difficult to comprehend. Several achievements, including the ability to obtain accurate, genome-wide genotypes of the human embryo and the development of population-level biobanks, have now made PGT for polygenic disease risk applicable in clinical practice. With the rapid advances in embryonic polygenic risk scoring, diverse considerations beyond technical capability have been introduced.


Assuntos
Aneuploidia , Fertilização in vitro/normas , Doenças Fetais/diagnóstico , Doenças Genéticas Inatas/diagnóstico , Testes Genéticos/métodos , Diagnóstico Pré-Implantação/métodos , Feminino , Doenças Fetais/genética , Doenças Genéticas Inatas/embriologia , Doenças Genéticas Inatas/genética , Humanos , Gravidez
13.
Sci Rep ; 9(1): 17515, 2019 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-31748697

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

14.
Sci Rep ; 9(1): 15286, 2019 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-31653892

RESUMO

We construct risk predictors using polygenic scores (PGS) computed from common Single Nucleotide Polymorphisms (SNPs) for a number of complex disease conditions, using L1-penalized regression (also known as LASSO) on case-control data from UK Biobank. Among the disease conditions studied are Hypothyroidism, (Resistant) Hypertension, Type 1 and 2 Diabetes, Breast Cancer, Prostate Cancer, Testicular Cancer, Gallstones, Glaucoma, Gout, Atrial Fibrillation, High Cholesterol, Asthma, Basal Cell Carcinoma, Malignant Melanoma, and Heart Attack. We obtain values for the area under the receiver operating characteristic curves (AUC) in the range ~0.58-0.71 using SNP data alone. Substantially higher predictor AUCs are obtained when incorporating additional variables such as age and sex. Some SNP predictors alone are sufficient to identify outliers (e.g., in the 99th percentile of polygenic score, or PGS) with 3-8 times higher risk than typical individuals. We validate predictors out-of-sample using the eMERGE dataset, and also with different ancestry subgroups within the UK Biobank population. Our results indicate that substantial improvements in predictive power are attainable using training sets with larger case populations. We anticipate rapid improvement in genomic prediction as more case-control data become available for analysis.


Assuntos
Neoplasias da Mama/genética , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Genômica/métodos , Infarto do Miocárdio/genética , Neoplasias da Próstata/genética , Algoritmos , Neoplasias da Mama/diagnóstico , Estudos de Casos e Controles , Diabetes Mellitus Tipo 1/diagnóstico , Diabetes Mellitus Tipo 2/diagnóstico , Feminino , Predisposição Genética para Doença/genética , Humanos , Masculino , Modelos Genéticos , Herança Multifatorial , Infarto do Miocárdio/diagnóstico , Polimorfismo de Nucleotídeo Único , Prognóstico , Neoplasias da Próstata/diagnóstico , Curva ROC , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Fatores de Risco
15.
Eur J Med Genet ; 62(8): 103647, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31026593

RESUMO

Preimplantation genetic testing (PGT) has been successfully applied to reduce the risk of miscarriage, improve IVF success rates, and prevent inheritance of monogenic disease and unbalanced translocations. The present study provides the first method capable of simultaneous testing of aneuploidy (PGT-A), structural rearrangements (PGT-SR), and monogenic (PGT-M) disorders using a single platform. Using positive controls to establish performance characteristics, accuracies of 97 to >99% for each type of testing were observed. In addition, this study expands PGT to include predicting the risk of polygenic disorders (PGT-P) for the first time. Performance was established for two common diseases, hypothyroidism and type 1 diabetes, based upon availability of positive control samples from commercially available repositories. Data from the UK Biobank, eMERGE, and T1DBASE were used to establish and validate SNP-based predictors of each disease (7,311 SNPs for hypothyroidism and 82 for type 1 diabetes). Area under the curve of disease status prediction from genotypes alone were 0.71 for hypothyroidism and 0.68 for type 1 diabetes. The availability of expanded PGT to evaluate the risk of polygenic disorders in the preimplantation embryo has the potential to lower the prevalence of common genetic disease in humans.


Assuntos
Aborto Espontâneo/genética , Cromossomos/genética , Doenças Genéticas Inatas/genética , Diagnóstico Pré-Implantação , Aborto Espontâneo/fisiopatologia , Aneuploidia , Biópsia , Blastocisto/metabolismo , Feminino , Doenças Genéticas Inatas/patologia , Variação Estrutural do Genoma/genética , Genótipo , Humanos , Cariótipo , Herança Multifatorial/genética , Gravidez
16.
Trends Genet ; 34(10): 746-754, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30139641

RESUMO

Accurate prediction of complex traits requires using a large number of DNA variants. Advances in statistical and machine learning methodology enable the identification of complex patterns in high-dimensional settings. However, training these highly parameterized methods requires very large data sets. Until recently, such data sets were not available. But the situation is changing rapidly as very large biomedical data sets comprising individual genotype-phenotype data for hundreds of thousands of individuals become available in public and private domains. We argue that the convergence of advances in methodology and the advent of Big Genomic Data will enable unprecedented improvements in complex-trait prediction; we review theory and evidence supporting our claim and discuss challenges and opportunities that Big Data will bring to complex-trait prediction.


Assuntos
Big Data , Estudo de Associação Genômica Ampla/tendências , Herança Multifatorial/genética , Locos de Características Quantitativas/genética , Genômica , Genótipo , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
17.
Genetics ; 210(2): 477-497, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30150289

RESUMO

We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9% of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few centimeters of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from genome-wide complex trait analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier genome-wide association studies (GWAS) for out-of-sample validation of our results.


Assuntos
Estatura/genética , Modelos Genéticos , Genoma Humano , Humanos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...