Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 309
Filtrar
1.
Curr Protoc ; 4(6): e1055, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38837690

RESUMO

Data harmonization involves combining data from multiple independent sources and processing the data to produce one uniform dataset. Merging separate genotypes or whole-genome sequencing datasets has been proposed as a strategy to increase the statistical power of association tests by increasing the effective sample size. However, data harmonization is not a widely adopted strategy due to the difficulties with merging data (including confounding produced by batch effects and population stratification). Detailed data harmonization protocols are scarce and are often conflicting. Moreover, data harmonization protocols that accommodate samples of admixed ancestry are practically non-existent. Existing data harmonization procedures must be modified to ensure the heterogeneous ancestry of admixed individuals is incorporated into additional downstream analyses without confounding results. Here, we propose a set of guidelines for merging multi-platform genetic data from admixed samples that can be adopted by any investigator with elementary bioinformatics experience. We have applied these guidelines to aggregate 1544 tuberculosis (TB) case-control samples from six separate in-house datasets and conducted a genome-wide association study (GWAS) of TB susceptibility. The GWAS performed on the merged dataset had improved power over analyzing the datasets individually and produced summary statistics free from bias introduced by batch effects and population stratification. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Processing separate datasets comprising array genotype data Alternate Protocol 1: Processing separate datasets comprising array genotype and whole-genome sequencing data Alternate Protocol 2: Performing imputation using a local reference panel Basic Protocol 2: Merging separate datasets Basic Protocol 3: Ancestry inference using ADMIXTURE and RFMix Basic Protocol 4: Batch effect correction using pseudo-case-control comparisons.


Assuntos
Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Genômica/métodos , Genômica/normas , Tuberculose/genética , Estudos de Casos e Controles , Guias como Assunto , Predisposição Genética para Doença
2.
PLoS Comput Biol ; 18(1): e1009628, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35025869

RESUMO

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla , Genótipo , Polimorfismo de Nucleotídeo Único/genética , Biologia Computacional/métodos , Genética Populacional/métodos , Genética Populacional/normas , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Humanos , Masculino , Tanzânia
3.
Mol Genet Genomics ; 297(1): 33-46, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34755217

RESUMO

Based on molecular markers, genomic prediction enables us to speed up breeding schemes and increase the response to selection. There are several high-throughput genotyping platforms able to deliver thousands of molecular markers for genomic study purposes. However, even though its widely applied in plant breeding, species without a reference genome cannot fully benefit from genomic tools and modern breeding schemes. We used a method to assemble a population-tailored mock genome to call single-nucleotide polymorphism (SNP) markers without an available reference genome, and for the first time, we compared the results with standard genotyping platforms (array and genotyping-by-sequencing (GBS) using a reference genome) for performance in genomic prediction models. Our results indicate that using a population-tailored mock genome to call SNP delivers reliable estimates for the genomic relationship between genotypes. Furthermore, genomic prediction estimates were comparable to standard approaches, especially when considering only additive effects. However, mock genomes were slightly worse than arrays at predicting traits influenced by dominance effects, but still performed as well as standard GBS methods that use a reference genome. Nevertheless, the array-based SNP markers methods achieved the best predictive ability and reliability to estimate variance components. Overall, the mock genomes can be a worthy alternative for genomic selection studies, especially for those species where the reference genome is not available.


Assuntos
Biologia Computacional , Técnicas de Genotipagem , Modelos Genéticos , Animais , Quimera/genética , Biologia Computacional/métodos , Biologia Computacional/normas , Conjuntos de Dados como Assunto , Genoma , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Genômica/métodos , Genômica/normas , Genótipo , Técnicas de Genotipagem/métodos , Técnicas de Genotipagem/normas , Fenótipo , Padrões de Referência , Reprodutibilidade dos Testes , Seleção Genética , Especificidade da Espécie , Zea mays/classificação , Zea mays/genética
4.
PLoS Genet ; 17(12): e1009944, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34941872

RESUMO

High-throughput genotyping of large numbers of lines remains a key challenge in plant genetics, requiring geneticists and breeders to find a balance between data quality and the number of genotyped lines under a variety of different existing genotyping technologies when resources are limited. In this work, we are proposing a new imputation pipeline ("HBimpute") that can be used to generate high-quality genomic data from low read-depth whole-genome-sequence data. The key idea of the pipeline is the use of haplotype blocks from the software HaploBlocker to identify locally similar lines and subsequently use the reads of all locally similar lines in the variant calling for a specific line. The effectiveness of the pipeline is showcased on a dataset of 321 doubled haploid lines of a European maize landrace, which were sequenced at 0.5X read-depth. The overall imputing error rates are cut in half compared to state-of-the-art software like BEAGLE and STITCH, while the average read-depth is increased to 83X, thus enabling the calling of copy number variation. The usefulness of the obtained imputed data panel is further evaluated by comparing the performance of sequence data in common breeding applications to that of genomic data generated with a genotyping array. For both genome-wide association studies and genomic prediction, results are on par or even slightly better than results obtained with high-density array data (600k). In particular for genomic prediction, we observe slightly higher data quality for the sequence data compared to the 600k array in the form of higher prediction accuracies. This occurred specifically when reducing the data panel to the set of overlapping markers between sequence and array, indicating that sequencing data can benefit from the same marker ascertainment as used in the array process to increase the quality and usability of genomic data.


Assuntos
Estudo de Associação Genômica Ampla/normas , Técnicas de Genotipagem , Haplótipos/genética , Software , Variações do Número de Cópias de DNA/genética , Genoma/genética , Genômica/métodos , Genótipo , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento Completo do Genoma , Zea mays/genética
5.
Sci Rep ; 11(1): 19571, 2021 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-34599249

RESUMO

Ongoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


Assuntos
Patrimônio Genético , Estudo de Associação Genômica Ampla , Modelos Genéticos , Herança Multifatorial , Fenótipo , Área Sob a Curva , Bancos de Espécimes Biológicos , Predisposição Genética para Doença , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Curva ROC , Reino Unido
6.
Genetica ; 149(5-6): 313-325, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34480683

RESUMO

Reducing false discoveries caused by population stratification (PS) has always been a challenge in genome-wide association studies (GWAS). The current literature established several single marker approaches including genomic control (GC), EIGENSTRAT and generalized linear mixed model association test (GMMAT) and multi-marker methods such as LASSO mixed model (LASSOMM). However, the single-marker methods require prespecifying an arbitrary p value threshold in the selection process, likely resulting in suboptimal precision or recall. On the other hand, it appears that LASSOMM is extremely computationally intensive and may not suitable for large-scale GWAS. In this paper, we proposed a simple multi-marker approach (PCA-LASSO) combining principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO). We utilize PCA to correct for the confounding effects of PS and LASSO with built-in cross-validation for a data-driven selection. Compared to the current single-marker approaches, the proposed PCA-LASSO provides optimal balance between precision and recall, and consequently superior F1 scores. Similarly, compared to LASSOMM, PCA-LASSO markedly increases the precision while minimizing the loss of recall, and therefore improves the overall F1 score in presence of PS. More importantly, PCA-LASSO drastically reduces the computational time by > 1000 times when compared to LASSOMM. We applied PCA-LASSO to a real dataset of Alzheimer's disease and successfully identified SNP rs429358 (Gene APOE4) which has been widely reported to be associated with the onset and elevated risk of Alzheimer's disease. In conclusion, PCA-LASSO is a simple, fast, but accurate approach for GWAS in presence of latent PS.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Doença de Alzheimer/genética , Conjuntos de Dados como Assunto , Genômica , Humanos , Análise de Componente Principal , Fatores de Tempo
7.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34459489

RESUMO

In genome-wide association studies (GWAS), it has become commonplace to test millions of single-nucleotide polymorphisms (SNPs) for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive $P$-value thresholding, guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.


Assuntos
Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Testes Genéticos/normas , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Alelos , Mapeamento Cromossômico , Bases de Dados Genéticas , Testes Genéticos/métodos , Humanos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
8.
Genet Sel Evol ; 53(1): 64, 2021 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-34325663

RESUMO

BACKGROUND: With the completion of a single nucleotide polymorphism (SNP) chip for honey bees, the technical basis of genomic selection is laid. However, for its application in practice, methods to estimate genomic breeding values need to be adapted to the specificities of the genetics and breeding infrastructure of this species. Drone-producing queens (DPQ) are used for mating control, and usually, they head non-phenotyped colonies that will be placed on mating stations. Breeding queens (BQ) head colonies that are intended to be phenotyped and used to produce new queens. Our aim was to evaluate different breeding program designs for the initiation of genomic selection in honey bees. METHODS: Stochastic simulations were conducted to evaluate the quality of the estimated breeding values. We developed a variation of the genomic relationship matrix to include genotypes of DPQ and tested different sizes of the reference population. The results were used to estimate genetic gain in the initial selection cycle of a genomic breeding program. This program was run over six years, and different numbers of genotyped queens per year were considered. Resources could be allocated to increase the reference population, or to perform genomic preselection of BQ and/or DPQ. RESULTS: Including the genotypes of 5000 phenotyped BQ increased the accuracy of predictions of breeding values by up to 173%, depending on the size of the reference population and the trait considered. To initiate a breeding program, genotyping a minimum number of 1000 queens per year is required. In this case, genetic gain was highest when genomic preselection of DPQ was coupled with the genotyping of 10-20% of the phenotyped BQ. For maximum genetic gain per used genotype, more than 2500 genotyped queens per year and preselection of all BQ and DPQ are required. CONCLUSIONS: This study shows that the first priority in a breeding program is to genotype phenotyped BQ to obtain a sufficiently large reference population, which allows successful genomic preselection of queens. To maximize genetic gain, DPQ should be preselected, and their genotypes included in the genomic relationship matrix. We suggest, that the developed methods for genomic prediction are suitable for implementation in genomic honey bee breeding programs.


Assuntos
Abelhas/genética , Modelos Genéticos , Seleção Artificial , Animais , Genoma de Inseto , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Técnicas de Genotipagem/métodos
9.
Genes (Basel) ; 12(6)2021 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-34071952

RESUMO

Description of a perpetrator's eye colour can be an important investigative lead in a forensic case with no apparent suspects. Herein, we present 11 SNPs (Eye Colour 11-EC11) that are important for eye colour prediction and eye colour prediction models for a two-category reporting system (blue and brown) and a three-category system (blue, intermediate, and brown). The EC11 SNPs were carefully selected from 44 pigmentary variants in seven genes previously found to be associated with eye colours in 757 Europeans (Danes, Swedes, and Italians). Mathematical models using three different reporting systems: a quantitative system (PIE-score), a two-category system (blue and brown), and a three-category system (blue, intermediate, brown) were used to rank the variants. SNPs with a sufficient mean variable importance (above 0.3%) were selected for EC11. Eye colour prediction models using the EC11 SNPs were developed using leave-one-out cross-validation (LOOCV) in an independent data set of 523 Norwegian individuals. Performance of the EC11 models for the two- and three-category system was compared with models based on the IrisPlex SNPs and the most important eye colour locus, rs12913832. We also compared model performances with the IrisPlex online tool (IrisPlex Web). The EC11 eye colour prediction models performed slightly better than the IrisPlex and rs12913832 models in all reporting systems and better than the IrisPlex Web in the three-category system. Three important points to consider prior to the implementation of eye colour prediction in a forensic genetic setting are discussed: (1) the reference population, (2) the SNP set, and (3) the reporting strategy.


Assuntos
Cor de Olho/genética , Polimorfismo de Nucleotídeo Único , Genética Forense/métodos , Genética Forense/normas , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Humanos , Modelos Genéticos , Fenótipo , Países Escandinavos e Nórdicos
10.
Trends Genet ; 37(10): 868-871, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34183185

RESUMO

For identification of marker-trait associations (MTAs) for complex traits in animals and plants, thousands of genome-wide association studies (GWAS) were conducted during the past two decades. This involved regular improvement in methodology. Initially, a reference genome and SNPs were used; more recently pan-genomes and the markers structural variations (SVs)/k-mers are also being used.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Animais , Genoma/genética , Humanos , Fenótipo , Plantas/genética , Polimorfismo de Nucleotídeo Único/genética
11.
Eur J Hum Genet ; 29(11): 1611-1624, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34140649

RESUMO

Array technology to genotype single-nucleotide variants (SNVs) is widely used in genome-wide association studies (GWAS), clinical diagnostics, and linkage studies. Arrays have undergone a tremendous growth in both number and content over recent years making a comprehensive comparison all the more important. We have compared 28 genotyping arrays on their overall content, genome-wide coverage, imputation quality, presence of known GWAS loci, mtDNA variants and clinically relevant genes (i.e., American College of Medical Genetics (ACMG) actionable genes, pharmacogenetic genes, human leukocyte antigen (HLA) genes and SNV density). Our comparison shows that genome-wide coverage is highly correlated with the number of SNVs on the array but does not correlate with imputation quality, which is the main determinant of GWAS usability. Average imputation quality for all tested arrays was similar for European and African populations, indicating that this is not a good criterion for choosing a genotyping array. Rather, the additional content on the array, such as pharmacogenetics or HLA variants, should be the deciding factor. As the research question of a study will in large part determine which class of genes are of interest, there is not just one perfect array for all different research questions. This study can thus help as a guideline to determine which array best suits a study's requirements.


Assuntos
Testes Genéticos/normas , Técnicas de Genotipagem/normas , Análise de Sequência com Séries de Oligonucleotídeos/normas , Testes Genéticos/métodos , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Técnicas de Genotipagem/métodos , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Kit de Reagentes para Diagnóstico/normas , Sensibilidade e Especificidade
12.
Nat Commun ; 12(1): 3506, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-34108454

RESUMO

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Simulação por Computador , Frequência do Gene , Estudo de Associação Genômica Ampla/normas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Fenótipo , Tamanho da Amostra
13.
Genet Sel Evol ; 53(1): 46, 2021 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-34058971

RESUMO

BACKGROUND: In dairy cattle populations in which crossbreeding has been used, animals show some level of diversity in their origins. In rotational crossbreeding, for instance, crossbred dams are mated with purebred sires from different pure breeds, and the genetic composition of crossbred animals is an admixture of the breeds included in the rotation. How to use the data of such individuals in genomic evaluations is still an open question. In this study, we aimed at providing methodologies for the use of data from crossbred individuals with an admixed genetic background together with data from multiple pure breeds, for the purpose of genomic evaluations for both purebred and crossbred animals. A three-breed rotational crossbreeding system was mimicked using simulations based on animals genotyped with the 50 K single nucleotide polymorphism (SNP) chip. RESULTS: For purebred populations, within-breed genomic predictions generally led to higher accuracies than those from multi-breed predictions using combined data of pure breeds. Adding admixed population's (MIX) data to the combined pure breed data considering MIX as a different breed led to higher accuracies. When prediction models were able to account for breed origin of alleles, accuracies were generally higher than those from combining all available data, depending on the correlation of quantitative trait loci (QTL) effects between the breeds. Accuracies varied when using SNP effects from any of the pure breeds to predict the breeding values of MIX. Using those breed-specific SNP effects that were estimated separately in each pure breed, while accounting for breed origin of alleles for the selection candidates of MIX, generally improved the accuracies. Models that are able to accommodate MIX data with the breed origin of alleles approach generally led to higher accuracies than models without breed origin of alleles, depending on the correlation of QTL effects between the breeds. CONCLUSIONS: Combining all available data, pure breeds' and admixed population's data, in a multi-breed reference population is beneficial for the estimation of breeding values for pure breeds with a small reference population. For MIX, such an approach can lead to higher accuracies than considering breed origin of alleles for the selection candidates, and using breed-specific SNP effects estimated separately in each pure breed. Including MIX data in the reference population of multiple breeds by considering the breed origin of alleles, accuracies can be further improved. Our findings are relevant for breeding programs in which crossbreeding is systematically applied, and also for populations that involve different subpopulations and between which exchange of genetic material is routine practice.


Assuntos
Bovinos/genética , Hibridização Genética , Polimorfismo de Nucleotídeo Único , Animais , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Endogamia , Modelos Genéticos , Locos de Características Quantitativas , Padrões de Referência , Seleção Artificial
14.
Genet Sel Evol ; 53(1): 55, 2021 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-34187354

RESUMO

BACKGROUND: Mathematical models are needed for the design of breeding programs using genomic prediction. While deterministic models for selection on pedigree-based estimates of breeding values (PEBV) are available, these have not been fully developed for genomic selection, with a key missing component being the accuracy of genomic EBV (GEBV) of selection candidates. Here, a deterministic method was developed to predict this accuracy within a closed breeding population based on the accuracy of GEBV and PEBV in the reference population and the distance of selection candidates from their closest ancestors in the reference population. METHODS: The accuracy of GEBV was modeled as a combination of the accuracy of PEBV and of EBV based on genomic relationships deviated from pedigree (DEBV). Loss of the accuracy of DEBV from the reference to the target population was modeled based on the effective number of independent chromosome segments in the reference population (Me). Measures of Me derived from the inverse of the variance of relationships and from the accuracies of GEBV and PEBV in the reference population, derived using either a Fisher information or a selection index approach, were compared by simulation. RESULTS: Using simulation, both the Fisher and the selection index approach correctly predicted accuracy in the target population over time, both with and without selection. The index approach, however, resulted in estimates of Me that were less affected by heritability, reference size, and selection, and which are, therefore, more appropriate as a population parameter. The variance of relationships underpredicted Me and was greatly affected by selection. A leave-one-out cross-validation approach was proposed to estimate required accuracies of EBV in the reference population. Aspects of the methods were validated using real data. CONCLUSIONS: A deterministic method was developed to predict the accuracy of GEBV in selection candidates in a closed breeding population. The population parameter Me that is required for these predictions can be derived from an available reference data set, and applied to other reference data sets and traits for that population. This method can be used to evaluate the benefit of genomic prediction and to optimize genomic selection breeding programs.


Assuntos
Modelos Genéticos , Seleção Artificial , Animais , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Gado/genética , Linhagem , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
15.
Genetica ; 149(3): 143-153, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33963492

RESUMO

Genome-wide studies are prone to false positives due to inherently low priors and statistical power. One approach to ameliorate this problem is to seek validation of reported candidate genes across independent studies: genes with repeatedly discovered effects are less likely to be false positives. Inversely, genes reported only as many times as expected by chance alone, while possibly representing novel discoveries, are also more likely to be false positives. We show that, across over 30 genome-wide studies that reported Drosophila and Daphnia genes with possible roles in thermal adaptation, the combined lists of candidate genes and orthologous groups are rapidly approaching the total number of genes and orthologous groups in the respective genomes. This is consistent with the expectation of high frequency of false positives. The majority of these spurious candidates have been identified by one or a few studies, as expected by chance alone. In contrast, a noticeable minority of genes have been identified by numerous studies with the probabilities of such discoveries occurring by chance alone being exceedingly small. For this subset of genes, different studies are in agreement with each other despite differences in the ecological settings, genomic tools and methodology, and reporting thresholds. We provide a reference set of presumed true positives among Drosophila candidate genes and orthologous groups involved in response to changes in temperature, suitable for cross-validation purposes. Despite this approach being prone to false negatives, this list of presumed true positives includes several hundred genes, consistent with the "omnigenic" concept of genetic architecture of complex traits.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas , Termotolerância/genética , Animais , Artrópodes/genética , Artrópodes/fisiologia , Reações Falso-Positivas , Estudo de Associação Genômica Ampla/normas , Modelos Genéticos , Polimorfismo Genético , Padrões de Referência
16.
Med Sci Sports Exerc ; 53(5): 883-887, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-33844668

RESUMO

It is clear, based on a deep scientific literature base, that genetic and genomic factors play significant roles in determining a wide range of sport and exercise characteristics including exercise endurance capacity, strength, daily physical activity levels, and trainability of both endurance and strength. Although the research field of exercise systems genetics has rapidly expanded over the past two decades, many researchers publishing in this field are not extensively trained in molecular biology or genomics techniques, sometimes creating gaps in generating high-quality and cutting-edge research for publication. As current or former Associate Editors for Medicine and Science in Sports and Exercise that have handled the majority of exercise genetics articles for Medicine and Science in Sports and Exercise in the past 15 yr, we have observed a large number of scientific manuscripts submitted for publication review that have exhibited significant flaws preventing their publication; flaws that often directly stem from a lack of knowledge regarding the "state-of-the-art" methods and accepted literature base that is rapidly changing as the field evolves. The purpose of this commentary is to provide researchers-especially those coming from a nongenetics background attempting to publish in the exercise system genetics area-with recommendations regarding best-practice research standards and data analysis in the field of exercise systems genetics, to strengthen the overall literature in this important and evolving field of research.


Assuntos
Exercício Físico , Fenômenos Fisiológicos/genética , Polimorfismo de Nucleotídeo Único/genética , Editoração/normas , Pesquisa/normas , Desempenho Atlético/fisiologia , Análise de Dados , Estudo de Associação Genômica Ampla/normas , Genótipo , Humanos , Força Muscular/genética , Fenótipo , Condicionamento Físico Humano , Resistência Física/genética , Controle de Qualidade , Reprodutibilidade dos Testes , Projetos de Pesquisa/normas , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Tamanho da Amostra , Esportes/fisiologia
17.
Genetics ; 217(3)2021 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-33789342

RESUMO

Ghost quantitative trait loci (QTL) are the false discoveries in QTL mapping, that arise due to the "accumulation" of the polygenic effects, uniformly distributed over the genome. The locations on the chromosome that are strongly correlated with the total of the polygenic effects depend on a specific sample correlation structure determined by the genotypes at all loci. The problem is particularly severe when the same genotypes are used to study multiple QTL, e.g. using recombinant inbred lines or studying the expression QTL. In this case, the ghost QTL phenomenon can lead to false hotspots, where multiple QTL show apparent linkage to the same locus. We illustrate the problem using the classic backcross design and suggest that it can be solved by the application of the extended mixed effect model, where the random effects are allowed to have a nonzero mean. We provide formulas for estimating the thresholds for the corresponding t-test statistics and use them in the stepwise selection strategy, which allows for a simultaneous detection of several QTL. Extensive simulation studies illustrate that our approach eliminates ghost QTL/false hotspots, while preserving a high power of true QTL detection.


Assuntos
Cruzamentos Genéticos , Modelos Genéticos , Herança Multifatorial , Locos de Características Quantitativas , Animais , Cruzamento/métodos , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Plantas/genética
18.
Genome Res ; 31(4): 529-537, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33536225

RESUMO

Low-pass sequencing (sequencing a genome to an average depth less than 1× coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5× and 1×) and array genotyping (using the Illumina Global Screening Array [GSA]) on 120 DNA samples derived from African- and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave-one-out design. We evaluated overall imputation accuracy from these different assays as well as overall power for GWAS from imputed data and computed polygenic risk scores for coronary artery disease and breast cancer using previously derived weights. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome-wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼0.5× and higher compared to the Illumina GSA.


Assuntos
Estudo de Associação Genômica Ampla , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Haplótipos , Humanos , Fatores de Risco
19.
PLoS Comput Biol ; 17(2): e1007784, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33606672

RESUMO

Rare variants are thought to play an important role in the etiology of complex diseases and may explain a significant fraction of the missing heritability in genetic disease studies. Next-generation sequencing facilitates the association of rare variants in coding or regulatory regions with complex diseases in large cohorts at genome-wide scale. However, rare variant association studies (RVAS) still lack power when cohorts are small to medium-sized and if genetic variation explains a small fraction of phenotypic variance. Here we present a novel Bayesian rare variant Association Test using Integrated Nested Laplace Approximation (BATI). Unlike existing RVAS tests, BATI allows integration of individual or variant-specific features as covariates, while efficiently performing inference based on full model estimation. We demonstrate that BATI outperforms established RVAS methods on realistic, semi-synthetic whole-exome sequencing cohorts, especially when using meaningful biological context, such as functional annotation. We show that BATI achieves power above 70% in scenarios in which competing tests fail to identify risk genes, e.g. when risk variants in sum explain less than 0.5% of phenotypic variance. We have integrated BATI, together with five existing RVAS tests in the 'Rare Variant Genome Wide Association Study' (rvGWAS) framework for data analyzed by whole-exome or whole genome sequencing. rvGWAS supports rare variant association for genes or any other biological unit such as promoters, while allowing the analysis of essential functionalities like quality control or filtering. Applying rvGWAS to a Chronic Lymphocytic Leukemia study we identified eight candidate predisposition genes, including EHMT2 and COPS7A.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla/métodos , Teorema de Bayes , Benchmarking , Neoplasias da Mama/genética , Complexo do Signalossomo COP9/genética , Estudos de Casos e Controles , Estudos de Coortes , Biologia Computacional , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/normas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Antígenos de Histocompatibilidade/genética , Histona-Lisina N-Metiltransferase/genética , Humanos , Leucemia Linfocítica Crônica de Células B/genética , Controle de Qualidade , Fatores de Risco , Fatores de Transcrição/genética , Sequenciamento do Exoma/métodos , Sequenciamento do Exoma/normas , Sequenciamento do Exoma/estatística & dados numéricos , Sequenciamento Completo do Genoma/métodos , Sequenciamento Completo do Genoma/estatística & dados numéricos
20.
Am J Med Genet B Neuropsychiatr Genet ; 186(1): 16-27, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33576176

RESUMO

Genotype imputation across populations of mixed ancestry is critical for optimal discovery in large-scale genome-wide association studies (GWAS). Methods for direct imputation of GWAS summary-statistics were previously shown to be practically as accurate as summary statistics produced after raw genotype imputation, while incurring orders of magnitude lower computational burden. Given that direct imputation needs a precise estimation of linkage-disequilibrium (LD) and that most of the methods using a small reference panel for example, ~2,500-subject coming from the 1000 Genome-Project, there is a great need for much larger and more diverse reference panels. To accurately estimate the LD needed for an exhaustive analysis of any cosmopolitan cohort, we developed DISTMIX2. DISTMIX2: (a) uses a much larger and more diverse reference panel compared to traditional reference panels, and (b) can estimate weights of ethnic-mixture based solely on Z-scores, when allele frequencies are not available. We applied DISTMIX2 to GWAS summary-statistics from the psychiatric genetic consortium (PGC). DISTMIX2 uncovered signals in numerous new regions, with most of these findings coming from the rarer variants. Rarer variants provide much sharper location for the signals compared with common variants, as the LD for rare variants extends over a lower distance than for common ones. For example, while the original PGC post-traumatic stress disorder GWAS found only 3 marginal signals for common variants, we now uncover a very strong signal for a rare variant in PKN2, a gene associated with neuronal and hippocampal development. Thus, DISTMIX2 provides a robust and fast (re)imputation approach for most psychiatric GWAS-studies.


Assuntos
Estudo de Associação Genômica Ampla/normas , Transtornos Mentais/diagnóstico , Transtornos Mentais/genética , Polimorfismo de Nucleotídeo Único , Estudos de Coortes , Frequência do Gene , Humanos , Desequilíbrio de Ligação , Fenótipo , Padrões de Referência , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...