Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
Add more filters

Publication year range
1.
Am J Hum Genet ; 108(4): 656-668, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33770507

ABSTRACT

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.


Subject(s)
DNA Mutational Analysis/economics , DNA Mutational Analysis/standards , Genetic Variation/genetics , Genetics, Population/economics , Africa , DNA Mutational Analysis/methods , Genetics, Population/methods , Genome, Human/genetics , Genome-Wide Association Study , Health Equity , Humans , Microbiota , Whole Genome Sequencing/economics , Whole Genome Sequencing/standards
2.
Genome Res ; 31(4): 529-537, 2021 04.
Article in English | MEDLINE | ID: mdl-33536225

ABSTRACT

Low-pass sequencing (sequencing a genome to an average depth less than 1× coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5× and 1×) and array genotyping (using the Illumina Global Screening Array [GSA]) on 120 DNA samples derived from African- and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave-one-out design. We evaluated overall imputation accuracy from these different assays as well as overall power for GWAS from imputed data and computed polygenic risk scores for coronary artery disease and breast cancer using previously derived weights. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome-wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼0.5× and higher compared to the Illumina GSA.


Subject(s)
Genome-Wide Association Study , Genotype , High-Throughput Nucleotide Sequencing , Genome, Human , Genome-Wide Association Study/methods , Genome-Wide Association Study/standards , Haplotypes , Humans , Risk Factors
3.
Nature ; 533(7604): 539-42, 2016 05 26.
Article in English | MEDLINE | ID: mdl-27225129

ABSTRACT

Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.


Subject(s)
Brain/metabolism , Educational Status , Fetus/metabolism , Gene Expression Regulation/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide/genetics , Alzheimer Disease/genetics , Bipolar Disorder/genetics , Cognition , Computational Biology , Gene-Environment Interaction , Humans , Molecular Sequence Annotation , Schizophrenia/genetics , United Kingdom
4.
BMC Genomics ; 22(1): 197, 2021 Mar 20.
Article in English | MEDLINE | ID: mdl-33743587

ABSTRACT

BACKGROUND: Low pass sequencing has been proposed as a cost-effective alternative to genotyping arrays to identify genetic variants that influence multifactorial traits in humans. For common diseases this typically has required both large sample sizes and comprehensive variant discovery. Genotyping arrays are also routinely used to perform pharmacogenetic (PGx) experiments where sample sizes are likely to be significantly smaller, but clinically relevant effect sizes likely to be larger. RESULTS: To assess how low pass sequencing would compare to array based genotyping for PGx we compared a low-pass assay (in which 1x coverage or less of a target genome is sequenced) along with software for genotype imputation to standard approaches. We sequenced 79 individuals to 1x genome coverage and genotyped the same samples on the Affymetrix Axiom Biobank Precision Medicine Research Array (PMRA). We then down-sampled the sequencing data to 0.8x, 0.6x, and 0.4x coverage, and performed imputation. Both the genotype data and the sequencing data were further used to impute human leukocyte antigen (HLA) genotypes for all samples. We compared the sequencing data and the genotyping array data in terms of four metrics: overall concordance, concordance at single nucleotide polymorphisms in pharmacogenetics-related genes, concordance in imputed HLA genotypes, and imputation r2. Overall concordance between the two assays ranged from 98.2% (for 0.4x coverage sequencing) to 99.2% (for 1x coverage sequencing), with qualitatively similar numbers for the subsets of variants most important in pharmacogenetics. At common single nucleotide polymorphisms (SNPs), the mean imputation r2 from the genotyping array was 0.90, which was comparable to the imputation r2 from 0.4x coverage sequencing, while the mean imputation r2 from 1x sequencing data was 0.96. CONCLUSIONS: These results indicate that low-pass sequencing to a depth above 0.4x coverage attains higher power for association studies when compared to the PMRA and should be considered as a competitive alternative to genotyping arrays for trait mapping in pharmacogenetics.


Subject(s)
Genome-Wide Association Study , Pharmacogenetics , Genotype , Genotyping Techniques , High-Throughput Nucleotide Sequencing , Humans , Polymorphism, Single Nucleotide
5.
PLoS Biol ; 15(9): e2002458, 2017 Sep.
Article in English | MEDLINE | ID: mdl-28873088

ABSTRACT

A number of open questions in human evolutionary genetics would become tractable if we were able to directly measure evolutionary fitness. As a step towards this goal, we developed a method to examine whether individual genetic variants, or sets of genetic variants, currently influence viability. The approach consists in testing whether the frequency of an allele varies across ages, accounting for variation in ancestry. We applied it to the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort and to the parents of participants in the UK Biobank. Across the genome, we found only a few common variants with large effects on age-specific mortality: tagging the APOE ε4 allele and near CHRNA3. These results suggest that when large, even late-onset effects are kept at low frequency by purifying selection. Testing viability effects of sets of genetic variants that jointly influence 1 of 42 traits, we detected a number of strong signals. In participants of the UK Biobank of British ancestry, we found that variants that delay puberty timing are associated with a longer parental life span (P~6.2 × 10-6 for fathers and P~2.0 × 10-3 for mothers), consistent with epidemiological studies. Similarly, variants associated with later age at first birth are associated with a longer maternal life span (P~1.4 × 10-3). Signals are also observed for variants influencing cholesterol levels, risk of coronary artery disease (CAD), body mass index, as well as risk of asthma. These signals exhibit consistent effects in the GERA cohort and among participants of the UK Biobank of non-British ancestry. We also found marked differences between males and females, most notably at the CHRNA3 locus, and variants associated with risk of CAD and cholesterol levels. Beyond our findings, the analysis serves as a proof of principle for how upcoming biomedical data sets can be used to learn about selection effects in contemporary humans.


Subject(s)
Evolution, Molecular , Genetic Fitness , Genetics, Population/methods , Models, Genetic , Selection, Genetic , Cohort Studies , Female , Gene Frequency , Genetic Variation , Humans , Male
6.
Nature ; 482(7385): 390-4, 2012 Feb 05.
Article in English | MEDLINE | ID: mdl-22307276

ABSTRACT

The mapping of expression quantitative trait loci (eQTLs) has emerged as an important tool for linking genetic variation to changes in gene regulation. However, it remains difficult to identify the causal variants underlying eQTLs, and little is known about the regulatory mechanisms by which they act. Here we show that genetic variants that modify chromatin accessibility and transcription factor binding are a major mechanism through which genetic variation leads to gene expression differences among humans. We used DNase I sequencing to measure chromatin accessibility in 70 Yoruba lymphoblastoid cell lines, for which genome-wide genotypes and estimates of gene expression levels are also available. We obtained a total of 2.7 billion uniquely mapped DNase I-sequencing (DNase-seq) reads, which allowed us to produce genome-wide maps of chromatin accessibility for each individual. We identified 8,902 locations at which the DNase-seq read depth correlated significantly with genotype at a nearby single nucleotide polymorphism or insertion/deletion (false discovery rate = 10%). We call such variants 'DNase I sensitivity quantitative trait loci' (dsQTLs). We found that dsQTLs are strongly enriched within inferred transcription factor binding sites and are frequently associated with allele-specific changes in transcription factor binding. A substantial fraction (16%) of dsQTLs are also associated with variation in the expression levels of nearby genes (that is, these loci are also classified as eQTLs). Conversely, we estimate that as many as 55% of eQTL single nucleotide polymorphisms are also dsQTLs. Our observations indicate that dsQTLs are highly abundant in the human genome and are likely to be important contributors to phenotypic variation.


Subject(s)
DNA Footprinting , Deoxyribonuclease I/metabolism , Gene Expression Regulation/genetics , Genetic Variation/genetics , Quantitative Trait Loci/genetics , Chromatin/genetics , Chromatin/metabolism , Gene Expression Profiling , Genome, Human/genetics , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA , Transcription Factors/metabolism
7.
Trends Genet ; 30(9): 377-89, 2014 Sep.
Article in English | MEDLINE | ID: mdl-25168683

ABSTRACT

Genetic information contains a record of the history of our species, and technological advances have transformed our ability to access this record. Many studies have used genome-wide data from populations today to learn about the peopling of the globe and subsequent adaptation to local conditions. Implicit in this research is the assumption that the geographic locations of people today are informative about the geographic locations of their ancestors in the distant past. However, it is now clear that long-range migration, admixture, and population replacement subsequent to the initial out-of-Africa expansion have altered the genetic structure of most of the world's human populations. In light of this we argue that it is time to critically reevaluate current models of the peopling of the globe, as well as the importance of natural selection in determining the geographic distribution of phenotypes. We specifically highlight the transformative potential of ancient DNA. By accessing the genetic make-up of populations living at archaeologically known times and places, ancient DNA makes it possible to directly track migrations and responses to natural selection.


Subject(s)
DNA/genetics , DNA/history , Genetics, Population , Genome, Human , Geography , Selection, Genetic/genetics , Africa , Evolution, Molecular , History, Ancient , Humans , Phenotype
8.
Am J Hum Genet ; 94(4): 559-73, 2014 Apr 03.
Article in English | MEDLINE | ID: mdl-24702953

ABSTRACT

Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWASs). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. I describe a statistical model that uses association statistics computed across the genome to identify classes of genomic elements that are enriched with or depleted of loci influencing a trait. The model naturally incorporates multiple types of annotations. I applied the model to GWASs of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, body mass index, and Crohn disease. For each trait, I used the model to evaluate the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over 100 tissues and cell lines. The fraction of phenotype-associated SNPs influencing protein sequence ranged from around 2% (for platelet volume) up to around 20% (for low-density lipoprotein cholesterol), repressed chromatin was significantly depleted for SNPs associated with several traits, and cell-type-specific DNase-I hypersensitive sites were enriched with SNPs associated with several traits (for example, the spleen in platelet volume). Finally, reweighting each GWAS by using information from functional genomics increased the number of loci with high-confidence associations by around 5%.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Bayes Theorem , Humans , Phenotype , Polymorphism, Single Nucleotide
9.
Bioinformatics ; 32(2): 283-5, 2016 Jan 15.
Article in English | MEDLINE | ID: mdl-26395773

ABSTRACT

UNLABELLED: We present a method to identify approximately independent blocks of linkage disequilibrium in the human genome. These blocks enable automated analysis of multiple genome-wide association studies. AVAILABILITY AND IMPLEMENTATION: code: http://bitbucket.org/nygcresearch/ldetect; data: http://bitbucket.org/nygcresearch/ldetect-data. CONTACT: tberisa@nygenome.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromosome Mapping/methods , Genome, Human , Genome-Wide Association Study , Software , Algorithms , Genetic Markers , Humans , Linkage Disequilibrium
10.
Proc Natl Acad Sci U S A ; 111(7): 2632-7, 2014 Feb 18.
Article in English | MEDLINE | ID: mdl-24550290

ABSTRACT

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to ∼900-1,800 y ago and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to ∼2,700-3,300 y ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.


Subject(s)
Demography , Emigration and Immigration , Ethnicity/genetics , Genetics, Population/methods , White People/genetics , Africa, Eastern , Africa, Southern , Computer Simulation , Europe/ethnology , Gene Flow , Gene Frequency , Genotype , Humans , Linkage Disequilibrium , Models, Genetic
11.
Nature ; 464(7289): 768-72, 2010 Apr 01.
Article in English | MEDLINE | ID: mdl-20220758

ABSTRACT

Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation/genetics , Genetic Variation/genetics , RNA, Messenger/analysis , RNA, Messenger/genetics , Transcription, Genetic/genetics , Alleles , Black People/genetics , Consensus Sequence/genetics , DNA, Complementary/genetics , Exons/genetics , Humans , Nigeria , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , RNA Splice Sites/genetics , Sequence Analysis, RNA
12.
Genome Res ; 22(4): 602-10, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22207615

ABSTRACT

Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success.


Subject(s)
Genetic Variation , Primates/genetics , Sequence Analysis, RNA/methods , Transcriptome/genetics , Animals , Endangered Species , Evolution, Molecular , Genome/genetics , Humans , Liver/metabolism , Phylogeny , Primates/classification , Species Specificity
13.
PLoS Genet ; 8(11): e1002967, 2012.
Article in English | MEDLINE | ID: mdl-23166502

ABSTRACT

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.


Subject(s)
Gene Frequency , Genetic Drift , Genome-Wide Association Study , Algorithms , Animals , Breeding , Dogs , Humans , Models, Genetic , Polymorphism, Single Nucleotide , Population/genetics , Wolves/genetics
14.
PLoS Genet ; 8(10): e1003000, 2012.
Article in English | MEDLINE | ID: mdl-23071454

ABSTRACT

Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter-individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady-state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci ("rdQTLs"). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.


Subject(s)
Gene Expression , Genetic Variation , Quantitative Trait Loci , RNA Stability , Cell Line , Chromosome Mapping , Gene Expression Profiling , Gene Expression Regulation , Genome-Wide Association Study , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , RNA Interference
15.
Proc Biol Sci ; 281(1789): 20140930, 2014 Aug 22.
Article in English | MEDLINE | ID: mdl-24990677

ABSTRACT

While gene flow between distantly related populations is increasingly recognized as a potentially important source of adaptive genetic variation for humans, fully characterized examples are rare. In addition, the role that natural selection for resistance to vivax malaria may have played in the extreme distribution of the protective Duffy-null allele, which is nearly completely fixed in mainland sub-Saharan Africa and absent elsewhere, is controversial. We address both these issues by investigating the evolution of the Duffy-null allele in the Malagasy, a recently admixed population with major ancestry components from both East Asia and mainland sub-Saharan Africa. We used genome-wide genetic data and extensive computer simulations to show that the high frequency of the Duffy-null allele in Madagascar can only be explained in the absence of positive natural selection under extreme demographic scenarios involving high genetic drift. However, the observed genomic single nucleotide polymorphism diversity in the Malagasy is incompatible with such extreme demographic scenarios, indicating that positive selection for the Duffy-null allele best explains the high frequency of the allele in Madagascar. We estimate the selection coefficient to be 0.066. Because vivax malaria is endemic to Madagascar, this result supports the hypothesis that malaria resistance drove fixation of the Duffy-null allele in mainland sub-Saharan Africa.


Subject(s)
Duffy Blood-Group System/genetics , Gene Frequency , Receptors, Cell Surface/genetics , Selection, Genetic , Africa South of the Sahara , Asian People/genetics , Black People/genetics , Computer Simulation , Genetic Drift , Genetics, Population , Humans , Madagascar , Models, Genetic , Polymorphism, Single Nucleotide
16.
G3 (Bethesda) ; 14(2)2024 Feb 07.
Article in English | MEDLINE | ID: mdl-38038370

ABSTRACT

Low-pass sequencing with genotype imputation has been adopted as a cost-effective method for genotyping. The most widely used method of short-read sequencing uses sequencing by synthesis (SBS). Here we perform a study of a novel sequencing technology-avidity sequencing. In this short note, we compare the performance of imputation from low-pass libraries sequenced on an Element AVITI system (which utilizes avidity sequencing) to those sequenced on an Illumina NovaSeq 6000 (which utilizes SBS) with an SP flow cell for the same set of biological samples across a range of genetic ancestries. We observed dramatically lower optical duplication rates in the data deriving from the AVITI system compared to the NovaSeq 6000, resulting in higher effective coverage given a fixed number of sequenced bases, and comparable imputation accuracy performance between sequencing chemistries across ancestries. This study demonstrates that avidity sequencing is a viable alternative to the standard SBS chemistries for applications involving low-pass sequencing plus imputation.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genotype , Genome-Wide Association Study/methods
17.
PLoS Genet ; 6(12): e1001236, 2010 Dec 09.
Article in English | MEDLINE | ID: mdl-21151575

ABSTRACT

While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.


Subject(s)
Alternative Splicing , Genetic Variation , RNA, Messenger/genetics , RNA, Messenger/metabolism , Cell Line , Exons , Humans , Introns , RNA Splice Sites , RNA, Messenger/chemistry , Sequence Analysis, RNA
18.
PLoS Genet ; 6(6): e1000974, 2010 Jun 03.
Article in English | MEDLINE | ID: mdl-20532200

ABSTRACT

Although little is known about the role of the cystic fibrosis transmembrane regulator (CFTR) gene in reproductive physiology, numerous variants in this gene have been implicated in etiology of male infertility due to congenital bilateral absence of the vas deferens (CBAVD). Here, we studied the fertility effects of three CBAVD-associated CFTR polymorphisms, the (TG)m and polyT repeat polymorphisms in intron 8 and Met470Val in exon 10, in healthy men of European descent. Homozygosity for the Met470 allele was associated with lower birth rates, defined as the number of births per year of marriage (P = 0.0029). The Met470Val locus explained 4.36% of the phenotypic variance in birth rate, and men homozygous for the Met470 allele had 0.56 fewer children on average compared to Val470 carrier men. The derived Val470 allele occurs at high frequencies in non-African populations (allele frequency = 0.51 in HapMap CEU), whereas it is very rare in African population (Fst = 0.43 between HapMap CEU and YRI). In addition, haplotypes bearing Val470 show a lack of genetic diversity and are thus longer than haplotypes bearing Met470 (measured by an integrated haplotype score [iHS] of -1.93 in HapMap CEU). The fraction of SNPs in the HapMap Phase2 data set with more extreme Fst and iHS measures is 0.003, consistent with a selective sweep outside of Africa. The fertility advantage conferred by Val470 relative to Met470 may provide a selective mechanism for these population genetic observations.


Subject(s)
Birth Rate , Cystic Fibrosis Transmembrane Conductance Regulator/genetics , Fertilization , Genetics, Population , Polymorphism, Genetic , Alleles , Haplotypes , Humans , Male , Methionine/genetics
19.
Front Genet ; 14: 1148301, 2023.
Article in English | MEDLINE | ID: mdl-37359370

ABSTRACT

The increasing incidence of bovine congestive heart failure (BCHF) in feedlot cattle poses a significant challenge to the beef industry from economic loss, reduced performance, and reduced animal welfare attributed to cardiac insufficiency. Changes to cardiac morphology as well as abnormal pulmonary arterial pressure (PAP) in cattle of mostly Angus ancestry have been recently characterized. However, congestive heart failure affecting cattle late in the feeding period has been an increasing problem and tools are needed for the industry to address the rate of mortality in the feedlot for multiple breeds. At harvest, a population of 32,763 commercial fed cattle were phenotyped for cardiac morphology with associated production data collected from feedlot processing to harvest at a single feedlot and packing plant in the Pacific Northwest. A sub-population of 5,001 individuals were selected for low-pass genotyping to estimate variance components and genetic correlations between heart score and the production traits observed during the feeding period. At harvest, the incidence of a heart score of 4 or 5 in this population was approximately 4.14%, indicating a significant proportion of feeder cattle are at risk of cardiac mortality before harvest. Heart scores were also significantly and positively correlated with the percentage Angus ancestry observed by genomic breed percentage analysis. The heritability of heart score measured as a binary (scores 1 and 2 = 0, scores 4 and 5 = 1) trait was 0.356 in this population, which indicates development of a selection tool to reduce the risk of congestive heart failure as an EPD (expected progeny difference) is feasible. Genetic correlations of heart score with growth traits and feed intake were moderate and positive (0.289-0.460). Genetic correlations between heart score and backfat and marbling score were -0.120 and -0.108, respectively. Significant genetic correlation to traits of high economic importance in existing selection indexes explain the increased rate of congestive heart failure observed over time. These results indicate potential to implement heart score observed at harvest as a phenotype under selection in genetic evaluation in order to reduce feedlot mortality due to cardiac insufficiency and improve overall cardiopulmonary health in feeder cattle.

20.
Bioinformatics ; 27(15): 2144-6, 2011 Aug 01.
Article in English | MEDLINE | ID: mdl-21690102

ABSTRACT

MOTIVATION: Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multicopy sequences which have been incorrectly assembled and collapsed into a single copy. RESULTS: Using sequencing data from the 1000 Genomes Project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls. AVAILABILITY: Files for masking out these regions are available at eqtl.uchicago.edu


Subject(s)
Chromatin Immunoprecipitation/methods , Gene Dosage , Sequence Analysis/methods , Base Sequence , Computational Biology/methods , Genome, Human , High-Throughput Nucleotide Sequencing , Humans
SELECTION OF CITATIONS
SEARCH DETAIL