Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Mol Biol Evol ; 38(5): 2131-2151, 2021 05 04.
Article in English | MEDLINE | ID: mdl-33355662

ABSTRACT

Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021).


Subject(s)
Genome, Human , Haplotypes , Software , Algorithms , False Negative Reactions , False Positive Reactions , Humans , Mexico , Phylogeography
2.
Lancet Oncol ; 19(6): 785-798, 2018 06.
Article in English | MEDLINE | ID: mdl-29753700

ABSTRACT

BACKGROUND: Medulloblastoma is associated with rare hereditary cancer predisposition syndromes; however, consensus medulloblastoma predisposition genes have not been defined and screening guidelines for genetic counselling and testing for paediatric patients are not available. We aimed to assess and define these genes to provide evidence for future screening guidelines. METHODS: In this international, multicentre study, we analysed patients with medulloblastoma from retrospective cohorts (International Cancer Genome Consortium [ICGC] PedBrain, Medulloblastoma Advanced Genomics International Consortium [MAGIC], and the CEFALO series) and from prospective cohorts from four clinical studies (SJMB03, SJMB12, SJYC07, and I-HIT-MED). Whole-genome sequences and exome sequences from blood and tumour samples were analysed for rare damaging germline mutations in cancer predisposition genes. DNA methylation profiling was done to determine consensus molecular subgroups: WNT (MBWNT), SHH (MBSHH), group 3 (MBGroup3), and group 4 (MBGroup4). Medulloblastoma predisposition genes were predicted on the basis of rare variant burden tests against controls without a cancer diagnosis from the Exome Aggregation Consortium (ExAC). Previously defined somatic mutational signatures were used to further classify medulloblastoma genomes into two groups, a clock-like group (signatures 1 and 5) and a homologous recombination repair deficiency-like group (signatures 3 and 8), and chromothripsis was investigated using previously established criteria. Progression-free survival and overall survival were modelled for patients with a genetic predisposition to medulloblastoma. FINDINGS: We included a total of 1022 patients with medulloblastoma from the retrospective cohorts (n=673) and the four prospective studies (n=349), from whom blood samples (n=1022) and tumour samples (n=800) were analysed for germline mutations in 110 cancer predisposition genes. In our rare variant burden analysis, we compared these against 53 105 sequenced controls from ExAC and identified APC, BRCA2, PALB2, PTCH1, SUFU, and TP53 as consensus medulloblastoma predisposition genes according to our rare variant burden analysis and estimated that germline mutations accounted for 6% of medulloblastoma diagnoses in the retrospective cohort. The prevalence of genetic predispositions differed between molecular subgroups in the retrospective cohort and was highest for patients in the MBSHH subgroup (20% in the retrospective cohort). These estimates were replicated in the prospective clinical cohort (germline mutations accounted for 5% of medulloblastoma diagnoses, with the highest prevalence [14%] in the MBSHH subgroup). Patients with germline APC mutations developed MBWNT and accounted for most (five [71%] of seven) cases of MBWNT that had no somatic CTNNB1 exon 3 mutations. Patients with germline mutations in SUFU and PTCH1 mostly developed infant MBSHH. Germline TP53 mutations presented only in childhood patients in the MBSHH subgroup and explained more than half (eight [57%] of 14) of all chromothripsis events in this subgroup. Germline mutations in PALB2 and BRCA2 were observed across the MBSHH, MBGroup3, and MBGroup4 molecular subgroups and were associated with mutational signatures typical of homologous recombination repair deficiency. In patients with a genetic predisposition to medulloblastoma, 5-year progression-free survival was 52% (95% CI 40-69) and 5-year overall survival was 65% (95% CI 52-81); these survival estimates differed significantly across patients with germline mutations in different medulloblastoma predisposition genes. INTERPRETATION: Genetic counselling and testing should be used as a standard-of-care procedure in patients with MBWNT and MBSHH because these patients have the highest prevalence of damaging germline mutations in known cancer predisposition genes. We propose criteria for routine genetic screening for patients with medulloblastoma based on clinical and molecular tumour characteristics. FUNDING: German Cancer Aid; German Federal Ministry of Education and Research; German Childhood Cancer Foundation (Deutsche Kinderkrebsstiftung); European Research Council; National Institutes of Health; Canadian Institutes for Health Research; German Cancer Research Center; St Jude Comprehensive Cancer Center; American Lebanese Syrian Associated Charities; Swiss National Science Foundation; European Molecular Biology Organization; Cancer Research UK; Hertie Foundation; Alexander and Margaret Stewart Trust; V Foundation for Cancer Research; Sontag Foundation; Musicians Against Childhood Cancer; BC Cancer Foundation; Swedish Council for Health, Working Life and Welfare; Swedish Research Council; Swedish Cancer Society; the Swedish Radiation Protection Authority; Danish Strategic Research Council; Swiss Federal Office of Public Health; Swiss Research Foundation on Mobile Communication; Masaryk University; Ministry of Health of the Czech Republic; Research Council of Norway; Genome Canada; Genome BC; Terry Fox Research Institute; Ontario Institute for Cancer Research; Pediatric Oncology Group of Ontario; The Family of Kathleen Lorette and the Clark H Smith Brain Tumour Centre; Montreal Children's Hospital Foundation; The Hospital for Sick Children: Sonia and Arthur Labatt Brain Tumour Research Centre, Chief of Research Fund, Cancer Genetics Program, Garron Family Cancer Centre, MDT's Garron Family Endowment; BC Childhood Cancer Parents Association; Cure Search Foundation; Pediatric Brain Tumor Foundation; Brainchild; and the Government of Ontario.


Subject(s)
Biomarkers, Tumor/genetics , Cerebellar Neoplasms/genetics , DNA Methylation , Genetic Testing/methods , Germ-Line Mutation , Medulloblastoma/genetics , Models, Genetic , Adolescent , Adult , Cerebellar Neoplasms/mortality , Cerebellar Neoplasms/pathology , Cerebellar Neoplasms/therapy , Child , Child, Preschool , DNA Mutational Analysis , Female , Gene Expression Profiling , Genetic Predisposition to Disease , Heredity , Humans , Infant , Male , Medulloblastoma/mortality , Medulloblastoma/pathology , Medulloblastoma/therapy , Pedigree , Phenotype , Predictive Value of Tests , Progression-Free Survival , Prospective Studies , Reproducibility of Results , Retrospective Studies , Risk Factors , Transcriptome , Exome Sequencing , Young Adult
3.
Am J Hum Genet ; 97(5): 631-46, 2015 Nov 05.
Article in English | MEDLINE | ID: mdl-26522470

ABSTRACT

The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries--such as "Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?"--with either "yes" or "no." Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.


Subject(s)
Genetic Privacy , Genetic Variation , Genome, Human , Information Dissemination/methods , Task Performance and Analysis , Haplotypes , High-Throughput Nucleotide Sequencing , Humans
4.
Bioinformatics ; 33(8): 1147-1153, 2017 04 15.
Article in English | MEDLINE | ID: mdl-28035032

ABSTRACT

Motivation: Variant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X). Results: We have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illuminas single-sample caller CASAVA, Real Time Genomics multisample variant caller, and the GATK UnifiedGenotyper, respectively. Since NGS sequencing data may be accompanied by genotype data for the same samples, either collected concurrent to sequencing or from a previous study, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g. a different set of criteria to determine quality for rare versus common variants) and thereby provides insight into sequencing characteristics that indicate call quality for variants of different frequencies. Availability and Implementation: Code is available on Github at: https://github.com/suyashss/variant_validation. Contacts: suyashs@stanford.edu or mtaub@jhsph.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide , Whole Genome Sequencing/methods , Data Accuracy , Genome, Human , Genomics/methods , Genomics/standards , Genotype , Genotyping Techniques/methods , Genotyping Techniques/standards , High-Throughput Nucleotide Sequencing/standards , Humans , Whole Genome Sequencing/standards
5.
BMC Bioinformatics ; 17: 218, 2016 May 23.
Article in English | MEDLINE | ID: mdl-27216439

ABSTRACT

BACKGROUND: A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data. RESULTS: We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5 × speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension. CONCLUSIONS: These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.


Subject(s)
Genetics, Population , Genomics/methods , Software , Black or African American/genetics , Female , Gene Frequency , HapMap Project , Humans , Male , Southwestern United States
6.
Nat Commun ; 14(1): 6172, 2023 10 04.
Article in English | MEDLINE | ID: mdl-37794016

ABSTRACT

Atopic dermatitis (AD) is a common inflammatory skin condition and prior genome-wide association studies (GWAS) have identified 71 associated loci. In the current study we conducted the largest AD GWAS to date (discovery N = 1,086,394, replication N = 3,604,027), combining previously reported cohorts with additional available data. We identified 81 loci (29 novel) in the European-only analysis (which all replicated in a separate European analysis) and 10 additional loci in the multi-ancestry analysis (3 novel). Eight variants from the multi-ancestry analysis replicated in at least one of the populations tested (European, Latino or African), while two may be specific to individuals of Japanese ancestry. AD loci showed enrichment for DNAse I hypersensitivity and eQTL associations in blood. At each locus we prioritised candidate genes by integrating multi-omic data. The implicated genes are predominantly in immune pathways of relevance to atopic inflammation and some offer drug repurposing opportunities.


Subject(s)
Dermatitis, Atopic , Genome-Wide Association Study , Humans , Dermatitis, Atopic/genetics , Genetic Predisposition to Disease/genetics , Hispanic or Latino/genetics , Black People , Polymorphism, Single Nucleotide
7.
Nat Neurosci ; 24(7): 954-963, 2021 07.
Article in English | MEDLINE | ID: mdl-34045744

ABSTRACT

Major depressive disorder is the most common neuropsychiatric disorder, affecting 11% of veterans. Here we report results of a large meta-analysis of depression using data from the Million Veteran Program, 23andMe, UK Biobank and FinnGen, including individuals of European ancestry (n = 1,154,267; 340,591 cases) and African ancestry (n = 59,600; 25,843 cases). Transcriptome-wide association study analyses revealed significant associations with expression of NEGR1 in the hypothalamus and DRD2 in the nucleus accumbens, among others. We fine-mapped 178 genomic risk loci, and we identified likely pathogenicity in these variants and overlapping gene expression for 17 genes from our transcriptome-wide association study, including TRAF3. Finally, we were able to show substantial replications of our findings in a large independent cohort (n = 1,342,778) provided by 23andMe. This study sheds light on the genetic architecture of depression and provides new insight into the interrelatedness of complex psychiatric traits.


Subject(s)
Depressive Disorder, Major/genetics , Genetic Predisposition to Disease/genetics , Female , Genome-Wide Association Study , Humans , Male , Veterans
8.
Nat Commun ; 7: 12522, 2016 10 11.
Article in English | MEDLINE | ID: mdl-27725671

ABSTRACT

The African Diaspora in the Western Hemisphere represents one of the largest forced migrations in history and had a profound impact on genetic diversity in modern populations. To date, the fine-scale population structure of descendants of the African Diaspora remains largely uncharacterized. Here we present genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas. We summarize genetic variation in these populations, quantifying the postcolonial sex-biased European gene flow across multiple regions. Moreover, we refine estimates on the burden of deleterious variants carried across populations and how this varies with African ancestry. Our data are an important resource for empowering disease mapping studies in African-admixed individuals and will facilitate gene discovery for diseases disproportionately affecting individuals of African ancestry.


Subject(s)
Black People/genetics , Gene Flow , Genome, Human , Human Migration , Base Sequence , DNA, Intergenic/genetics , Female , Genetic Heterogeneity , Geography , Humans , Male , Phylogeny , Polymorphism, Single Nucleotide/genetics , Sexism
9.
PLoS One ; 10(6): e0129277, 2015.
Article in English | MEDLINE | ID: mdl-26110529

ABSTRACT

Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future.


Subject(s)
Genetic Variation , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Cloud Computing/economics , Databases, Genetic , High-Throughput Nucleotide Sequencing/economics , Humans , Software
10.
Science ; 349(6250): aab3884, 2015 Aug 21.
Article in English | MEDLINE | ID: mdl-26198033

ABSTRACT

How and when the Americas were populated remains contentious. Using ancient and modern genome-wide data, we found that the ancestors of all present-day Native Americans, including Athabascans and Amerindians, entered the Americas as a single migration wave from Siberia no earlier than 23 thousand years ago (ka) and after no more than an 8000-year isolation period in Beringia. After their arrival to the Americas, ancestral Native Americans diversified into two basal genetic branches around 13 ka, one that is now dispersed across North and South America and the other restricted to North America. Subsequent gene flow resulted in some Native Americans sharing ancestry with present-day East Asians (including Siberians) and, more distantly, Australo-Melanesians. Putative "Paleoamerican" relict populations, including the historical Mexican Pericúes and South American Fuego-Patagonians, are not directly related to modern Australo-Melanesians as suggested by the Paleoamerican Model.


Subject(s)
Human Migration/history , Indians, North American/history , Americas , Gene Flow , Genomics , History, Ancient , Humans , Indians, North American/genetics , Models, Genetic , Siberia
SELECTION OF CITATIONS
SEARCH DETAIL