Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 8.323
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 187(9): 2336-2341.e5, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38582080

RESUMO

The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute ∼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.


Assuntos
Genoma Humano , Sequências de Repetição em Tandem , Humanos , Sequências de Repetição em Tandem/genética , Sequenciamento Completo do Genoma , Bases de Dados Genéticas , Expansão das Repetições de DNA/genética , Estudo de Associação Genômica Ampla
2.
Cell ; 185(18): 3426-3440.e19, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36055201

RESUMO

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.


Assuntos
Genoma Humano , Sequenciamento Completo do Genoma , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação INDEL , Masculino , Polimorfismo de Nucleotídeo Único
3.
Cell ; 185(23): 4409-4427.e18, 2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-36368308

RESUMO

Fully understanding autism spectrum disorder (ASD) genetics requires whole-genome sequencing (WGS). We present the latest release of the Autism Speaks MSSNG resource, which includes WGS data from 5,100 individuals with ASD and 6,212 non-ASD parents and siblings (total n = 11,312). Examining a wide variety of genetic variants in MSSNG and the Simons Simplex Collection (SSC; n = 9,205), we identified ASD-associated rare variants in 718/5,100 individuals with ASD from MSSNG (14.1%) and 350/2,419 from SSC (14.5%). Considering genomic architecture, 52% were nuclear sequence-level variants, 46% were nuclear structural variants (including copy-number variants, inversions, large insertions, uniparental isodisomies, and tandem repeat expansions), and 2% were mitochondrial variants. Our study provides a guidebook for exploring genotype-phenotype correlations in families who carry ASD-associated rare variants and serves as an entry point to the expanded studies required to dissect the etiology in the ∼85% of the ASD population that remain idiopathic.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno do Espectro Autista/genética , Predisposição Genética para Doença , Variações do Número de Cópias de DNA/genética , Genômica
4.
Cell ; 184(8): 2239-2254.e39, 2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-33831375

RESUMO

Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.


Assuntos
Heterogeneidade Genética , Neoplasias/genética , Variações do Número de Cópias de DNA , DNA de Neoplasias/química , DNA de Neoplasias/metabolismo , Bases de Dados Genéticas , Resistencia a Medicamentos Antineoplásicos/genética , Humanos , Neoplasias/patologia , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma
5.
Cell ; 184(13): 3426-3437.e8, 2021 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-33991487

RESUMO

We identified an emerging severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant by viral whole-genome sequencing of 2,172 nasal/nasopharyngeal swab samples from 44 counties in California, a state in the western United States. Named B.1.427/B.1.429 to denote its two lineages, the variant emerged in May 2020 and increased from 0% to >50% of sequenced cases from September 2020 to January 2021, showing 18.6%-24% increased transmissibility relative to wild-type circulating strains. The variant carries three mutations in the spike protein, including an L452R substitution. We found 2-fold increased B.1.427/B.1.429 viral shedding in vivo and increased L452R pseudovirus infection of cell cultures and lung organoids, albeit decreased relative to pseudoviruses carrying the N501Y mutation common to variants B.1.1.7, B.1.351, and P.1. Antibody neutralization assays revealed 4.0- to 6.7-fold and 2.0-fold decreases in neutralizing titers from convalescent patients and vaccine recipients, respectively. The increased prevalence of a more transmissible variant in California exhibiting decreased antibody neutralization warrants further investigation.


Assuntos
Anticorpos Neutralizantes/imunologia , COVID-19/imunologia , COVID-19/transmissão , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/imunologia , Anticorpos Monoclonais/imunologia , Anticorpos Antivirais/imunologia , Humanos , Mutação/genética , Sequenciamento Completo do Genoma/métodos
6.
Cell ; 179(3): 736-749.e15, 2019 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-31626772

RESUMO

Underrepresentation of Asian genomes has hindered population and medical genetics research on Asians, leading to population disparities in precision medicine. By whole-genome sequencing of 4,810 Singapore Chinese, Malays, and Indians, we found 98.3 million SNPs and small insertions or deletions, over half of which are novel. Population structure analysis demonstrated great representation of Asian genetic diversity by three ethnicities in Singapore and revealed a Malay-related novel ancestry component. Furthermore, demographic inference suggested that Malays split from Chinese ∼24,800 years ago and experienced significant admixture with East Asians ∼1,700 years ago, coinciding with the Austronesian expansion. Additionally, we identified 20 candidate loci for natural selection, 14 of which harbored robust associations with complex traits and diseases. Finally, we show that our data can substantially improve genotype imputation in diverse Asian and Oceanian populations. These results highlight the value of our data as a resource to empower human genetics discovery across broad geographic regions.


Assuntos
Genética Populacional , Genoma Humano/genética , Seleção Genética , Sequenciamento Completo do Genoma , Povo Asiático/genética , Feminino , Genótipo , Humanos , Malásia/epidemiologia , Masculino , Polimorfismo de Nucleotídeo Único/genética , Singapura/epidemiologia
7.
Cell ; 174(3): 758-769.e9, 2018 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-30033370

RESUMO

While mutations affecting protein-coding regions have been examined across many cancers, structural variants at the genome-wide level are still poorly defined. Through integrative deep whole-genome and -transcriptome analysis of 101 castration-resistant prostate cancer metastases (109X tumor/38X normal coverage), we identified structural variants altering critical regulators of tumorigenesis and progression not detectable by exome approaches. Notably, we observed amplification of an intergenic enhancer region 624 kb upstream of the androgen receptor (AR) in 81% of patients, correlating with increased AR expression. Tandem duplication hotspots also occur near MYC, in lncRNAs associated with post-translational MYC regulation. Classes of structural variations were linked to distinct DNA repair deficiencies, suggesting their etiology, including associations of CDK12 mutation with tandem duplications, TP53 inactivation with inverted rearrangements and chromothripsis, and BRCA2 inactivation with deletions. Together, these observations provide a comprehensive view of how structural variations affect critical regulators in metastatic prostate cancer.


Assuntos
Variação Estrutural do Genoma/genética , Neoplasias da Próstata/genética , Idoso , Idoso de 80 Anos ou mais , Proteína BRCA2/metabolismo , Quinases Ciclina-Dependentes/metabolismo , Variações do Número de Cópias de DNA , Exoma , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Humanos , Masculino , Pessoa de Meia-Idade , Mutação , Metástase Neoplásica/genética , Proteínas Proto-Oncogênicas c-myc/genética , Proteínas Proto-Oncogênicas c-myc/metabolismo , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Sequências de Repetição em Tandem/genética , Proteína Supressora de Tumor p53/metabolismo , Sequenciamento Completo do Genoma/métodos
8.
Cell ; 174(2): 433-447.e19, 2018 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-29909985

RESUMO

Nearly all prostate cancer deaths are from metastatic castration-resistant prostate cancer (mCRPC), but there have been few whole-genome sequencing (WGS) studies of this disease state. We performed linked-read WGS on 23 mCRPC biopsy specimens and analyzed cell-free DNA sequencing data from 86 patients with mCRPC. In addition to frequent rearrangements affecting known prostate cancer genes, we observed complex rearrangements of the AR locus in most cases. Unexpectedly, these rearrangements include highly recurrent tandem duplications involving an upstream enhancer of AR in 70%-87% of cases compared with <2% of primary prostate cancers. A subset of cases displayed AR or MYC enhancer duplication in the context of a genome-wide tandem duplicator phenotype associated with CDK12 inactivation. Our findings highlight the complex genomic structure of mCRPC, nominate alterations that may inform prostate cancer treatment, and suggest that additional recurrent events in the non-coding mCRPC genome remain to be discovered.


Assuntos
Neoplasias de Próstata Resistentes à Castração/patologia , Receptores Androgênicos/genética , Sequenciamento Completo do Genoma , Idoso , Anilidas/uso terapêutico , Quinases Ciclina-Dependentes/genética , Quinases Ciclina-Dependentes/metabolismo , Elementos Facilitadores Genéticos/genética , Duplicação Gênica , Rearranjo Gênico , Genes myc , Loci Gênicos , Haplótipos , Humanos , Masculino , Pessoa de Meia-Idade , Metástase Neoplásica , PTEN Fosfo-Hidrolase/genética , Fenótipo , Antígeno Prostático Específico/sangue , Neoplasias de Próstata Resistentes à Castração/tratamento farmacológico , Neoplasias de Próstata Resistentes à Castração/genética , Inibidores de Proteínas Quinases/uso terapêutico , Piridinas/uso terapêutico
9.
Cell ; 168(3): 460-472.e14, 2017 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-28089356

RESUMO

Certain cell types function as factories, secreting large quantities of one or more proteins that are central to the physiology of the respective organ. Examples include surfactant proteins in lung alveoli, albumin in liver parenchyma, and lipase in the stomach lining. Whole-genome sequencing analysis of lung adenocarcinomas revealed noncoding somatic mutational hotspots near VMP1/MIR21 and indel hotspots in surfactant protein genes (SFTPA1, SFTPB, and SFTPC). Extrapolation to other solid cancers demonstrated highly recurrent and tumor-type-specific indel hotspots targeting the noncoding regions of highly expressed genes defining certain secretory cellular lineages: albumin (ALB) in liver carcinoma, gastric lipase (LIPF) in stomach carcinoma, and thyroglobulin (TG) in thyroid carcinoma. The sequence contexts of indels targeting lineage-defining genes were significantly enriched in the AATAATD DNA motif and specific chromatin contexts, including H3K27ac and H3K36me3. Our findings illuminate a prevalent and hitherto unrecognized mutational process linking cellular lineage and cancer.


Assuntos
Linhagem da Célula , Mutação INDEL , Mutação , Neoplasias/genética , Neoplasias/patologia , Regiões 3' não Traduzidas , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Proteínas de Membrana/genética , MicroRNAs/genética , Pessoa de Meia-Idade , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo Único , Proteínas Associadas a Surfactantes Pulmonares/genética
10.
Mol Cell ; 80(6): 1123-1134.e4, 2020 12 17.
Artigo em Inglês | MEDLINE | ID: mdl-33290743

RESUMO

Analyzing the genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from clinical samples is crucial for understanding viral spread and evolution as well as for vaccine development. Existing RNA sequencing methods are demanding on user technique and time and, thus, not ideal for time-sensitive clinical samples; these methods are also not optimized for high performance on viral genomes. We developed a facile, practical, and robust approach for metagenomic and deep viral sequencing from clinical samples. We demonstrate the utility of our approach on pharyngeal, sputum, and stool samples collected from coronavirus disease 2019 (COVID-19) patients, successfully obtaining whole metatranscriptomes and complete high-depth, high-coverage SARS-CoV-2 genomes with high yield and robustness. With a shortened hands-on time from sample to virus-enriched sequencing-ready library, this rapid, versatile, and clinic-friendly approach will facilitate molecular epidemiology studies during current and future outbreaks.


Assuntos
COVID-19/genética , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , RNA Viral/genética , SARS-CoV-2/genética , Sequenciamento Completo do Genoma , Animais , Humanos , Camundongos , Células NIH 3T3 , RNA Viral/metabolismo , SARS-CoV-2/metabolismo
11.
Mol Cell ; 80(3): 541-553.e5, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33068522

RESUMO

To address how genetic variation alters gene expression in complex cell mixtures, we developed direct nuclear tagmentation and RNA sequencing (DNTR-seq), which enables whole-genome and mRNA sequencing jointly in single cells. DNTR-seq readily identified minor subclones within leukemia patients. In a large-scale DNA damage screen, DNTR-seq was used to detect regions under purifying selection and identified genes where mRNA abundance was resistant to copy-number alteration, suggesting strong genetic compensation. mRNA sequencing (mRNA-seq) quality equals RNA-only methods, and the low positional bias of genomic libraries allowed detection of sub-megabase aberrations at ultra-low coverage. Each cell library is individually addressable and can be re-sequenced at increased depth, allowing multi-tiered study designs. Additionally, the direct tagmentation protocol enables coverage-independent estimation of ploidy, which can be used to identify cell singlets. Thus, DNTR-seq directly links each cell's state to its corresponding genome at scale, enabling routine analysis of heterogeneous tumors and other complex tissues.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Sequenciamento Completo do Genoma/métodos , Animais , Sequência de Bases/genética , Linhagem Celular Tumoral , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , RNA/genética , RNA Mensageiro/genética , Análise de Sequência de DNA/métodos
12.
Mol Cell ; 77(6): 1307-1321.e10, 2020 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-31954095

RESUMO

A comprehensive catalog of cancer driver mutations is essential for understanding tumorigenesis and developing therapies. Exome-sequencing studies have mapped many protein-coding drivers, yet few non-coding drivers are known because genome-wide discovery is challenging. We developed a driver discovery method, ActiveDriverWGS, and analyzed 120,788 cis-regulatory modules (CRMs) across 1,844 whole tumor genomes from the ICGC-TCGA PCAWG project. We found 30 CRMs with enriched SNVs and indels (FDR < 0.05). These frequently mutated regulatory elements (FMREs) were ubiquitously active in human tissues, showed long-range chromatin interactions and mRNA abundance associations with target genes, and were enriched in motif-rewiring mutations and structural variants. Genomic deletion of one FMRE in human cells caused proliferative deficiencies and transcriptional deregulation of cancer genes CCNB1IP1, CDH1, and CDKN2B, validating observations in FMRE-mutated tumors. Pathway analysis revealed further sub-significant FMREs at cancer genes and processes, indicating an unexplored landscape of infrequent driver mutations in the non-coding genome.


Assuntos
Biomarcadores Tumorais/genética , Cromatina/metabolismo , Redes Reguladoras de Genes , Mutação , Neoplasias/genética , Neoplasias/patologia , Sequências Reguladoras de Ácido Nucleico , Proliferação de Células , Cromatina/genética , Biologia Computacional/métodos , Análise Mutacional de DNA , Genoma Humano , Células HEK293 , Humanos
13.
Mol Cell ; 79(5): 728-740.e6, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32721385

RESUMO

Cytosine base editors (CBEs) generate C-to-T nucleotide substitutions in genomic target sites without inducing double-strand breaks. However, CBEs such as BE3 can cause genome-wide off-target changes via sgRNA-independent DNA deamination. By leveraging the orthogonal R-loops generated by SaCas9 nickase to mimic actively transcribed genomic loci that are more susceptible to cytidine deaminase, we set up a high-throughput assay for assessing sgRNA-independent off-target effects of CBEs in rice protoplasts. The reliability of this assay was confirmed by the whole-genome sequencing (WGS) of 10 base editors in regenerated rice plants. The R-loop assay was used to screen a series of rationally designed A3Bctd-BE3 variants for improved specificity. We obtained 2 efficient CBE variants, A3Bctd-VHM-BE3 and A3Bctd-KKR-BE3, and the WGS analysis revealed that these new CBEs eliminated sgRNA-independent DNA off-target edits in rice plants. Moreover, these 2 base editor variants were more precise at their target sites by producing fewer multiple C edits.


Assuntos
Citidina Desaminase/genética , Citosina , Edição de Genes/métodos , Antígenos de Histocompatibilidade Menor/genética , Oryza/genética , Citosina/química , Genes de Plantas , Humanos , Mutação , RNA Guia de Cinetoplastídeos/química , RNA de Plantas/química , Reprodutibilidade dos Testes
14.
Am J Hum Genet ; 111(10): 2129-2138, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39270648

RESUMO

Large-scale, multi-ethnic whole-genome sequencing (WGS) studies, such as the National Human Genome Research Institute Genome Sequencing Program's Centers for Common Disease Genomics (CCDG), play an important role in increasing diversity for genetic research. Before performing association analyses, assessing Hardy-Weinberg equilibrium (HWE) is a crucial step in quality control procedures to remove low quality variants and ensure valid downstream analyses. Diverse WGS studies contain ancestrally heterogeneous samples; however, commonly used HWE methods assume that the samples are homogeneous. Therefore, directly applying these to the whole dataset can yield statistically invalid results. To account for this heterogeneity, HWE can be tested on subsets of samples that have genetically homogeneous ancestries and the results aggregated at each variant. To facilitate valid HWE subset testing, we developed a semi-supervised learning approach that predicts homogeneous ancestries based on the genotype. This method provides a convenient tool for estimating HWE in the presence of population structure and missing self-reported race and ethnicities in diverse WGS studies. In addition, assessing HWE within the homogeneous ancestries provides reliable HWE estimates that will directly benefit downstream analyses, including association analyses in WGS studies. We applied our proposed method on the CCDG dataset, predicting homogeneous genetic ancestry groups for 60,545 multi-ethnic WGS samples to assess HWE within each group.


Assuntos
Aprendizado de Máquina Supervisionado , Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Genoma Humano , Genética Populacional/métodos , Etnicidade/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Genótipo
15.
Am J Hum Genet ; 111(10): 2203-2218, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39260370

RESUMO

To identify modifier loci underlying variation in body mass index (BMI) in persons with cystic fibrosis (pwCF), we performed a genome-wide association study (GWAS). Utilizing longitudinal height and weight data, along with demographic information and covariates from 4,393 pwCF, we calculated AvgBMIz representing the average of per-quarter BMI Z scores. The GWAS incorporated 9.8M single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) > 0.005 extracted from whole-genome sequencing (WGS) of each study subject. We observed genome-wide significant association with a variant in FTO (FaT mass and Obesity-associated gene; rs28567725; p value = 1.21e-08; MAF = 0.41, ß = 0.106; n = 4,393 individuals) and a variant within ADAMTS5 (A Disintegrin And Metalloproteinase with ThromboSpondin motifs 5; rs162500; p value = 2.11e-10; MAF = 0.005, ß = -0.768; n = 4,085 pancreatic-insufficient individuals). Notably, BMI-associated variants in ADAMTS5 occur on a haplotype that is much more common in African (AFR, MAF = 0.183) than European (EUR, MAF = 0.006) populations (1000 Genomes project). A polygenic risk score (PRS) calculated using 924 SNPs (excluding 17 in FTO) showed significant association with AvgBMIz (p value = 2.2e-16; r2 = 0.03). Association between variants in FTO and the PRS correlation reveals similarities in the genetic architecture of BMI in CF and the general population. Inclusion of Black individuals in whom the single-gene disorder CF is much less common but genomic diversity is greater facilitated detection of association with variants that are in LD with functional SNPs in ADAMTS5. Our results illustrate the importance of population diversity, particularly when attempting to identify variants that manifest only under certain physiologic conditions.


Assuntos
Dioxigenase FTO Dependente de alfa-Cetoglutarato , Índice de Massa Corporal , Fibrose Cística , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Fibrose Cística/genética , Masculino , Feminino , Dioxigenase FTO Dependente de alfa-Cetoglutarato/genética , Adulto , Proteína ADAMTS5/genética , Criança , Adolescente , Frequência do Gene , Haplótipos , Predisposição Genética para Doença , Adulto Jovem , Obesidade/genética , Genes Modificadores
16.
Am J Hum Genet ; 111(10): 2164-2175, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39226898

RESUMO

Variants that alter gene splicing are estimated to comprise up to a third of all disease-causing variants, yet they are hard to predict from DNA sequencing data alone. To overcome this, many groups are incorporating RNA-based analyses, which are resource intensive, particularly for diagnostic laboratories. There are thousands of functionally validated variants that induce mis-splicing; however, this information is not consolidated, and they are under-represented in ClinVar, which presents a barrier to variant interpretation and can result in duplication of validation efforts. To address this issue, we developed SpliceVarDB, an online database consolidating over 50,000 variants assayed for their effects on splicing in over 8,000 human genes. We evaluated over 500 published data sources and established a spliceogenicity scale to standardize, harmonize, and consolidate variant validation data generated by a range of experimental protocols. According to the strength of their supporting evidence, variants were classified as "splice-altering" (∼25%), "not splice-altering" (∼25%), and "low-frequency splice-altering" (∼50%), which correspond to weak or indeterminate evidence of spliceogenicity. Importantly, 55% of the splice-altering variants in SpliceVarDB are outside the canonical splice sites (5.6% are deep intronic). These variants can support the variant curation diagnostic pathway and can be used to provide the high-quality data necessary to develop more accurate in silico splicing predictors. The variants are accessible through an online platform, SpliceVarDB, with additional features for visualization, variant information, in silico predictions, and validation metrics. SpliceVarDB is a very large collection of splice-altering variants and is available at https://splicevardb.org.


Assuntos
Bases de Dados Genéticas , Splicing de RNA , Humanos , Splicing de RNA/genética , Variação Genética , Processamento Alternativo/genética , Software
17.
Am J Hum Genet ; 111(5): 990-995, 2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38636510

RESUMO

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.


Assuntos
Frequência do Gene , Genótipo , Polimorfismo de Nucleotídeo Único , Software , Humanos , Estudos de Coortes , Desequilíbrio de Ligação , Estudo de Associação Genômica Ampla/métodos , Genoma Humano , Controle de Qualidade , Aprendizado de Máquina , Sequenciamento Completo do Genoma/normas , Sequenciamento Completo do Genoma/métodos
18.
Am J Hum Genet ; 111(7): 1271-1281, 2024 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-38843839

RESUMO

There is mounting evidence of the value of clinical genome sequencing (cGS) in individuals with suspected rare genetic disease (RGD), but cGS performance and impact on clinical care in a diverse population drawn from both high-income countries (HICs) and low- and middle-income countries (LMICs) has not been investigated. The iHope program, a philanthropic cGS initiative, established a network of 24 clinical sites in eight countries through which it provided cGS to individuals with signs or symptoms of an RGD and constrained access to molecular testing. A total of 1,004 individuals (median age, 6.5 years; 53.5% male) with diverse ancestral backgrounds (51.8% non-majority European) were assessed from June 2016 to September 2021. The diagnostic yield of cGS was 41.4% (416/1,004), with individuals from LMIC sites 1.7 times more likely to receive a positive test result compared to HIC sites (LMIC 56.5% [195/345] vs. HIC 33.5% [221/659], OR 2.6, 95% CI 1.9-3.4, p < 0.0001). A change in diagnostic evaluation occurred in 76.9% (514/668) of individuals. Change of management, inclusive of specialty referrals, imaging and testing, therapeutic interventions, and palliative care, was reported in 41.4% (285/694) of individuals, which increased to 69.2% (480/694) when genetic counseling and avoidance of additional testing were also included. Individuals from LMIC sites were as likely as their HIC counterparts to experience a change in diagnostic evaluation (OR 6.1, 95% CI 1.1-∞, p = 0.05) and change of management (OR 0.9, 95% CI 0.5-1.3, p = 0.49). Increased access to genomic testing may support diagnostic equity and the reduction of global health care disparities.


Assuntos
Testes Genéticos , Doenças Raras , Sequenciamento Completo do Genoma , Humanos , Masculino , Doenças Raras/genética , Doenças Raras/diagnóstico , Feminino , Criança , Testes Genéticos/métodos , Pré-Escolar , Adolescente , Adulto , Lactente , Doenças Genéticas Inatas/genética , Doenças Genéticas Inatas/diagnóstico
19.
Proc Natl Acad Sci U S A ; 121(30): e2403505121, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39012830

RESUMO

American chestnut (Castanea dentata) is a deciduous tree species of eastern North America that was decimated by the introduction of the chestnut blight fungus (Cryphonectria parasitica) in the early 20th century. Although millions of American chestnuts survive as root collar sprouts, these trees rarely reproduce. Thus, the species is considered functionally extinct. American chestnuts with improved blight resistance have been developed through interspecific hybridization followed by conspecific backcrossing, and by genetic engineering. Incorporating adaptive genomic diversity into these backcross families and transgenic lines is important for restoring the species across broad climatic gradients. To develop sampling recommendations for ex situ conservation of wild adaptive genetic variation, we coupled whole-genome resequencing of 384 stump sprouts with genotype-environment association analyses and found that the species range can be subdivided into three seed zones characterized by relatively homogeneous adaptive allele frequencies. We estimated that 21 to 29 trees per seed zone will need to be conserved to capture most extant adaptive diversity. We also resequenced the genomes of 269 backcross trees to understand the extent to which the breeding program has already captured wild adaptive diversity, and to estimate optimal reintroduction sites for specific families on the basis of their adaptive portfolio and future climate projections. Taken together, these results inform the development of an ex situ germplasm conservation and breeding plan to target blight-resistant breeding populations to specific environments and provides a blueprint for developing restoration plans for other imperiled tree species.


Assuntos
Fagaceae , Genoma de Planta , Doenças das Plantas , Fagaceae/genética , Fagaceae/microbiologia , Doenças das Plantas/microbiologia , Doenças das Plantas/genética , Ascomicetos/genética , Variação Genética , Resistência à Doença/genética , Clima
20.
Hum Mol Genet ; 33(19): 1643-1647, 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-38970828

RESUMO

Systemic sclerosis (SSc) is a heterogeneous rare autoimmune fibrosing disorder affecting connective tissue. The etiology of systemic sclerosis is largely unknown and many genes have been suggested as susceptibility loci of modest impact by genome-wide association study (GWAS). Multiple factors can contribute to the pathological process of the disease, which makes it more difficult to identify possible disease-causing genetic alterations. In this study, we have applied whole genome sequencing (WGS) in 101 indexed family trios, supplemented with transcriptome sequencing on cultured fibroblast cells of four patients and five family controls where available. Single nucleotide variants (SNVs) and copy number variants (CNVs) were examined, with emphasis on de novo variants. We also performed enrichment test for rare variants in candidate genes previously proposed in association with systemic sclerosis. We identified 42 exonic and 34 ncRNA de novo SNV changes in 101 trios, from a total of over 6000 de novo variants genome wide. We observed higher than expected de novo variants in PRKXP1 gene. We also observed such phenomenon along with increased expression in patient group in NEK7 gene. Additionally, we also observed significant enrichment of rare variants in candidate genes in the patient cohort, further supporting the complexity/multi-factorial etiology of systemic sclerosis. Our findings identify new candidate genes including PRKXP1 and NEK7 for future studies in SSc. We observed rare variant enrichment in candidate genes previously proposed in association with SSc, which suggest more efforts should be pursued to further investigate possible pathogenetic mechanisms associated with those candidate genes.


Assuntos
Variações do Número de Cópias de DNA , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Escleroderma Sistêmico , Sequenciamento Completo do Genoma , Humanos , Escleroderma Sistêmico/genética , Escleroderma Sistêmico/patologia , Variações do Número de Cópias de DNA/genética , Masculino , Feminino , Adulto , Quinases Relacionadas a NIMA/genética , Pessoa de Meia-Idade , Fibroblastos/metabolismo , Fibroblastos/patologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA