RESUMO
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma , Metagenômica , Microbiota , Metagenômica/métodos , Metagenoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Microbiota/genética , Humanos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodosRESUMO
While polygenic risk scores (PRSs) enable early identification of genetic risk for chronic obstructive pulmonary disease (COPD), predictive performance is limited when the discovery and target populations are not well matched. Hypothesizing that the biological mechanisms of disease are shared across ancestry groups, we introduce a PrediXcan-derived polygenic transcriptome risk score (PTRS) to improve cross-ethnic portability of risk prediction. We constructed the PTRS using summary statistics from application of PrediXcan on large-scale GWASs of lung function (forced expiratory volume in 1 s [FEV1] and its ratio to forced vital capacity [FEV1/FVC]) in the UK Biobank. We examined prediction performance and cross-ethnic portability of PTRS through smoking-stratified analyses both on 29,381 multi-ethnic participants from TOPMed population/family-based cohorts and on 11,771 multi-ethnic participants from TOPMed COPD-enriched studies. Analyses were carried out for two dichotomous COPD traits (moderate-to-severe and severe COPD) and two quantitative lung function traits (FEV1 and FEV1/FVC). While the proposed PTRS showed weaker associations with disease than PRS for European ancestry, the PTRS showed stronger association with COPD than PRS for African Americans (e.g., odds ratio [OR] = 1.24 [95% confidence interval [CI]: 1.08-1.43] for PTRS versus 1.10 [0.96-1.26] for PRS among heavy smokers with ≥ 40 pack-years of smoking) for moderate-to-severe COPD. Cross-ethnic portability of the PTRS was significantly higher than the PRS (paired t test p < 2.2 × 10-16 with portability gains ranging from 5% to 28%) for both dichotomous COPD traits and across all smoking strata. Our study demonstrates the value of PTRS for improved cross-ethnic portability compared to PRS in predicting COPD risk.
Assuntos
Doença Pulmonar Obstrutiva Crônica , Transcriptoma , Humanos , Pulmão , National Heart, Lung, and Blood Institute (U.S.) , Doença Pulmonar Obstrutiva Crônica/genética , Fatores de Risco , Estados Unidos/epidemiologiaRESUMO
Plasma levels of fibrinogen, coagulation factors VII and VIII and von Willebrand factor (vWF) are four intermediate phenotypes that are heritable and have been associated with the risk of clinical thrombotic events. To identify rare and low-frequency variants associated with these hemostatic factors, we conducted whole-exome sequencing in 10 860 individuals of European ancestry (EA) and 3529 African Americans (AAs) from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and the National Heart, Lung and Blood Institute's Exome Sequencing Project. Gene-based tests demonstrated significant associations with rare variation (minor allele frequency < 5%) in fibrinogen gamma chain (FGG) (with fibrinogen, P = 9.1 × 10-13), coagulation factor VII (F7) (with factor VII, P = 1.3 × 10-72; seven novel variants) and VWF (with factor VIII and vWF; P = 3.2 × 10-14; one novel variant). These eight novel rare variant associations were independent of the known common variants at these loci and tended to have much larger effect sizes. In addition, one of the rare novel variants in F7 was significantly associated with an increased risk of venous thromboembolism in AAs (Ile200Ser; rs141219108; P = 4.2 × 10-5). After restricting gene-based analyses to only loss-of-function variants, a novel significant association was detected and replicated between factor VIII levels and a stop-gain mutation exclusive to AAs (rs3211938) in CD36 molecule (CD36). This variant has previously been linked to dyslipidemia but not with the levels of a hemostatic factor. These efforts represent the largest integration of whole-exome sequence data from two national projects to identify genetic variation associated with plasma hemostatic factors.
Assuntos
Fator VIII , Hemostáticos , Fator VII/genética , Fator VIII/genética , Fibrinogênio/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento do Exoma , Fator de von Willebrand/análise , Fator de von Willebrand/genéticaRESUMO
The development of the microbiome from infancy to childhood is dependent on a range of factors, with microbial-immune crosstalk during this time thought to be involved in the pathobiology of later life diseases1-9 such as persistent islet autoimmunity and type 1 diabetes10-12. However, to our knowledge, no studies have performed extensive characterization of the microbiome in early life in a large, multi-centre population. Here we analyse longitudinal stool samples from 903 children between 3 and 46 months of age by 16S rRNA gene sequencing (n = 12,005) and metagenomic sequencing (n = 10,867), as part of the The Environmental Determinants of Diabetes in the Young (TEDDY) study. We show that the developing gut microbiome undergoes three distinct phases of microbiome progression: a developmental phase (months 3-14), a transitional phase (months 15-30), and a stable phase (months 31-46). Receipt of breast milk, either exclusive or partial, was the most significant factor associated with the microbiome structure. Breastfeeding was associated with higher levels of Bifidobacterium species (B. breve and B. bifidum), and the cessation of breast milk resulted in faster maturation of the gut microbiome, as marked by the phylum Firmicutes. Birth mode was also significantly associated with the microbiome during the developmental phase, driven by higher levels of Bacteroides species (particularly B. fragilis) in infants delivered vaginally. Bacteroides was also associated with increased gut diversity and faster maturation, regardless of the birth mode. Environmental factors including geographical location and household exposures (such as siblings and furry pets) also represented important covariates. A nested case-control analysis revealed subtle associations between microbial taxonomy and the development of islet autoimmunity or type 1 diabetes. These data determine the structural and functional assembly of the microbiome in early life and provide a foundation for targeted mechanistic investigation into the consequences of microbial-immune crosstalk for long-term health.
Assuntos
Microbioma Gastrointestinal/imunologia , Microbioma Gastrointestinal/fisiologia , Inquéritos e Questionários , Adolescente , Animais , Bifidobacterium/classificação , Bifidobacterium/genética , Bifidobacterium/isolamento & purificação , Aleitamento Materno/estatística & dados numéricos , Estudos de Casos e Controles , Criança , Pré-Escolar , Análise por Conglomerados , Conjuntos de Dados como Assunto , Diabetes Mellitus Tipo 1/imunologia , Diabetes Mellitus Tipo 1/microbiologia , Feminino , Firmicutes/classificação , Firmicutes/genética , Firmicutes/isolamento & purificação , Microbioma Gastrointestinal/genética , Humanos , Lactente , Masculino , Leite Humano/imunologia , Leite Humano/microbiologia , Animais de Estimação , RNA Ribossômico 16S/genética , Irmãos , Fatores de TempoRESUMO
Whole-genome sequencing (WGS) can improve assessment of low-frequency and rare variants, particularly in non-European populations that have been underrepresented in existing genomic studies. The genetic determinants of C-reactive protein (CRP), a biomarker of chronic inflammation, have been extensively studied, with existing genome-wide association studies (GWASs) conducted in >200,000 individuals of European ancestry. In order to discover novel loci associated with CRP levels, we examined a multi-ancestry population (n = 23,279) with WGS (â¼38× coverage) from the Trans-Omics for Precision Medicine (TOPMed) program. We found evidence for eight distinct associations at the CRP locus, including two variants that have not been identified previously (rs11265259 and rs181704186), both of which are non-coding and more common in individuals of African ancestry (â¼10% and â¼1% minor allele frequency, respectively, and rare or monomorphic in 1000 Genomes populations of East Asian, South Asian, and European ancestry). We show that the minor (G) allele of rs181704186 is associated with lower CRP levels and decreased transcriptional activity and protein binding in vitro, providing a plausible molecular mechanism for this African ancestry-specific signal. The individuals homozygous for rs181704186-G have a mean CRP level of 0.23 mg/L, in contrast to individuals heterozygous for rs181704186 with mean CRP of 2.97 mg/L and major allele homozygotes with mean CRP of 4.11 mg/L. This study demonstrates the utility of WGS in multi-ethnic populations to drive discovery of complex trait associations of large effect and to identify functional alleles in noncoding regulatory regions.
Assuntos
Povo Asiático/genética , População Negra/genética , Proteína C-Reativa/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , População Branca/genética , Sequenciamento Completo do Genoma/métodos , Estudos de Coortes , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de LigaçãoRESUMO
PURPOSE: Cardiovascular disease (CVD) is the leading cause of death in adults in the United States, yet the benefits of genetic testing are not universally accepted. METHODS: We developed the "HeartCare" panel of genes associated with CVD, evaluating high-penetrance Mendelian conditions, coronary artery disease (CAD) polygenic risk, LPA gene polymorphisms, and specific pharmacogenetic (PGx) variants. We enrolled 709 individuals from cardiology clinics at Baylor College of Medicine, and samples were analyzed in a CAP/CLIA-certified laboratory. Results were returned to the ordering physician and uploaded to the electronic medical record. RESULTS: Notably, 32% of patients had a genetic finding with clinical management implications, even after excluding PGx results, including 9% who were molecularly diagnosed with a Mendelian condition. Among surveyed physicians, 84% reported medical management changes based on these results, including specialist referrals, cardiac tests, and medication changes. LPA polymorphisms and high polygenic risk of CAD were found in 20% and 9% of patients, respectively, leading to diet, lifestyle, and other changes. Warfarin and simvastatin pharmacogenetic variants were present in roughly half of the cohort. CONCLUSION: Our results support the use of genetic information in routine cardiovascular health management and provide a roadmap for accompanying research.
Assuntos
Cardiologia , Doenças Cardiovasculares , Adulto , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/terapia , Testes Genéticos , Humanos , Farmacogenética/métodos , Testes Farmacogenômicos , Estados UnidosRESUMO
Oligopeptides are important markers of protein metabolism, as they are cleaved from larger polypeptides and proteins. Genetic association studies may help elucidate their origin and function. In 1,552 European Americans and 1,872 African Americans of the Atherosclerosis Risk in Communities study, we performed whole-genome and whole-exome sequencing and measured serum levels of 25 peptides. Common variants (minor allele frequency > 5%) were analysed individually. We grouped low-frequency variants (minor allele frequency ≤ 5%) by a genome-wide sliding window using region-based aggregate tests. Furthermore, low-frequency regulatory variants were grouped by gene, as were functional coding variants. All analyses were performed separately in each ancestry group and then meta-analysed. We identified 22 common variant associations with peptide levels (P-value < 4.2 × 10-10), including 16 novel gene-peptide pairs. Notably, variants in kinin-kallikrein genes KNG1, F12, KLKB1, and ACE were associated with several different peptides. Variants in KLKB1 and ACE were associated with a fragment of complement component 3f. Both common variants and low-frequency coding variants in CPN1 were associated with a fibrinogen cleavage peptide. Four sliding windows were significantly associated with peptide levels (P-value < 4.2 × 10-10). Our results highlight the importance of the kinin-kallikrein system in the regulation of serum peptide levels, strengthen the evidence for a broad link between the kinin-kallikrein and complement systems, and suggest a role of CPN1 in the conversion of fibrinogen to fibrin.
Assuntos
Aterosclerose/genética , Aterosclerose/metabolismo , Negro ou Afro-Americano/genética , Alelos , Aterosclerose/sangue , Exoma/genética , Feminino , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Calicreínas/sangue , Calicreínas/genética , Masculino , Pessoa de Meia-Idade , Peptídeos/sangue , Peptídeos/genética , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Fatores de Risco , População Branca/genética , Sequenciamento Completo do GenomaRESUMO
Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers--including NF1, APC, RB1 and ATM--and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.
Assuntos
Adenocarcinoma Bronquioloalveolar/genética , Neoplasias Pulmonares/genética , Mutação/genética , Feminino , Dosagem de Genes , Regulação Neoplásica da Expressão Gênica , Genes Supressores de Tumor , Humanos , Masculino , Proto-Oncogenes/genéticaRESUMO
Respiratory syncytial virus (RSV) infection in immunocompromised individuals often leads to prolonged illness, progression to severe lower respiratory tract infection, and even death. How the host immune environment of the hematopoietic stem cell transplant (HCT) adults can affect viral genetic variation during an acute infection is not understood well. In the present study, we performed whole genome sequencing of RSV/A or RSV/B from samples collected longitudinally from HCT adults with normal (<14 days) and delayed (≥14 days) RSV clearance who were enrolled in a ribavirin trial. We determined the inter-host and intra-host genetic variation of RSV and the effect of mutations on putative glycosylation sites. The inter-host variation of RSV is centered in the attachment (G) and fusion (F) glycoprotein genes followed by polymerase (L) and matrix (M) genes. Interestingly, the overall genetic variation was constant between normal and delayed clearance groups for both RSV/A and RSV/B. Intra-host variation primarily occurred in the G gene followed by non-structural protein (NS1) and L genes; however, gain or loss of stop codons and frameshift mutations appeared only in the G gene and only in the delayed viral clearance group. Potential gain or loss of O-linked glycosylation sites in the G gene occurred both in RSV/A and RSV/B isolates. For RSV F gene, loss of N-linked glycosylation site occurred in three RSV/B isolates within an antigenic epitope. Both oral and aerosolized ribavirin did not cause any mutations in the L gene. In summary, prolonged viral shedding and immune deficiency resulted in RSV variation, especially in structural mutations in the G gene, possibly associated with immune evasion. Therefore, sequencing and monitoring of RSV isolates from immunocompromised patients are crucial as they can create escape mutants that can impact the effectiveness of upcoming vaccines and treatments.
RESUMO
The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging. In this study, we present the DRAGEN KIV-2 CN caller, which utilizes short reads. Data across 166 WGS show that the caller has high accuracy, compared to optical mapping and can further phase approximately 50% of the samples. We compared KIV-2 CN numbers to 24 previously postulated KIV-2 relevant SNVs, revealing that many are ineffective predictors of KIV-2 copy number. Population studies, including USA-based cohorts, showed distinct KIV-2 CN, distributions for European-, African-, and Hispanic-American populations and further underscored the limitations of SNV predictors. We demonstrate that the CN estimates correlate significantly with the available Lp(a) protein levels and that phasing is highly important.
Assuntos
Alelos , Doenças Cardiovasculares , Lipoproteína(a) , Humanos , Doenças Cardiovasculares/genética , Lipoproteína(a)/genética , Lipoproteína(a)/sangue , Variações do Número de Cópias de DNA , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Respiratory syncytial virus (RSV) is the leading cause of lower respiratory tract infections in children worldwide, while human noroviruses (HuNoV) are a leading cause of epidemic and sporadic acute gastroenteritis. Generating full-length genome sequences for these viruses is crucial for understanding viral diversity and tracking emerging variants. However, obtaining high-quality sequencing data is often challenging due to viral strain variability, quality, and low titers. Here, we present a set of comprehensive oligonucleotide probe sets designed from 1,570 RSV and 1,376 HuNoV isolate sequences in GenBank. Using these probe sets and a capture enrichment sequencing workflow, 85 RSV positive nasal swab samples and 55 (49 stool and six human intestinal enteroids) HuNoV positive samples encompassing major subtypes and genotypes were characterized. The Ct values of these samples ranged from 17.0-29.9 for RSV, and from 20.2-34.8 for HuNoV, with some HuNoV having below the detection limit. The mean percentage of post-processing reads mapped to viral genomes was 85.1% for RSV and 40.8% for HuNoV post-capture, compared to 0.08% and 1.15% in pre-capture libraries, respectively. Full-length genomes were>99% complete in all RSV positive samples and >96% complete in 47/55 HuNoV positive samples-a significant improvement over genome recovery from pre-capture libraries. RSV transcriptome (subgenomic mRNAs) sequences were also characterized from this data. Probe-based capture enrichment offers a comprehensive approach for RSV and HuNoV genome sequencing and monitoring emerging variants.
RESUMO
BACKGROUND: Risk for venous thromboembolism has a strong genetic component. Whole genome sequencing from the TOPMed program (Trans-Omics for Precision Medicine) allowed us to look for new associations, particularly rare variants missed by standard genome-wide association studies. METHODS: The 3793 cases and 7834 controls (11.6% of cases were individuals of African, Hispanic/Latino, or Asian ancestry) were analyzed using a single variant approach and an aggregate gene-based approach using our primary filter (included only loss-of-function and missense variants predicted to be deleterious) and our secondary filter (included all missense variants). RESULTS: Single variant analyses identified associations at 5 known loci. Aggregate gene-based analyses identified only PROC (odds ratio, 6.2 for carriers of rare variants; P=7.4×10-14) when using our primary filter. Employing our secondary variant filter led to a smaller effect size at PROC (odds ratio, 3.8; P=1.6×10-14), while excluding variants found only in rare isoforms led to a larger one (odds ratio, 7.5). Different filtering strategies improved the signal for 2 other known genes: PROS1 became significant (minimum P=1.8×10-6 with the secondary filter), while SERPINC1 did not (minimum P=4.4×10-5 with minor allele frequency <0.0005). Results were largely the same when restricting the analyses to include only unprovoked cases; however, one novel gene, MS4A1, became significant (P=4.4×10-7 using all missense variants with minor allele frequency <0.0005). CONCLUSIONS: Here, we have demonstrated the importance of using multiple variant filtering strategies, as we detected additional genes when filtering variants based on their predicted deleteriousness, frequency, and presence on the most expressed isoforms. Our primary analyses did not identify new candidate loci; thus larger follow-up studies are needed to replicate the novel MS4A1 locus and to identify additional rare variation associated with venous thromboembolism.
Assuntos
Estudo de Associação Genômica Ampla , Tromboembolia Venosa , Humanos , Tromboembolia Venosa/genética , Medicina de Precisão , Predisposição Genética para Doença , Frequência do GeneRESUMO
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
RESUMO
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hematologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
RESUMO
The fundamental challenge of multi-sample structural variant (SV) analysis such as merging and benchmarking is identifying when two SVs are the same. Common approaches for comparing SVs were developed alongside technologies which produce ill-defined boundaries. As SV detection becomes more exact, algorithms to preserve this refined signal are needed. Here, we present Truvari-an SV comparison, annotation, and analysis toolkit-and demonstrate the effect of SV comparison choices by building population-level VCFs from 36 haplotype-resolved long-read assemblies. We observe over-merging from other SV merging approaches which cause up to a 2.2× inflation of allele frequency, relative to Truvari.
Assuntos
Algoritmos , Variação Estrutural do Genoma , Humanos , Frequência do Gene , Alelos , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Genoma HumanoRESUMO
BACKGROUND: Routine genome-wide screening for cardiovascular disease risk may inform clinical decision-making. However, little is known about whether clinicians and patients would find such testing useful or acceptable within the context of a genomics-enabled learning health system. METHODS: We conducted surveys with patients and their clinicians who were participating in the HeartCare Study, a precision cardiology care project that returned results from a next-generation sequencing panel of 158 genes associated with cardiovascular disease risk. Six weeks after return of results, we assessed patients' and clinicians' perceived utility and disutility of HeartCare, the effect of the test on clinical recommendations, and patients' attitudes toward integration of research and clinical care. RESULTS: Among 666 HeartCare patients with a result returned during the survey study period, 42.0% completed a full or partial survey. Patient-participants who completed a full survey (n=224) generally had positive perceptions of HeartCare independent of whether they received a positive or negative result. Most patient-participants considered genetic testing for cardiovascular disease risk to have more benefit than risk (88.3%) and agreed that it provided information that they wanted to know (81.2%), while most disagreed that the test caused them to feel confused (77.7%) or overwhelmed (78.0%). For 122 of their patients with positive results, clinicians (n=13) reported making changes in clinical care for 66.4% of patients, recommending changes in health behaviors for 36.9% of patients, and recommending to 33.6% of patients that their family members have clinical testing. CONCLUSIONS: Both patients and clinicians thought the HeartCare panel screen for cardiovascular disease risk provided information that was useful in terms of personal or health benefits to the patient and that informed clinical care without causing patients to be confused or overwhelmed. Further research is needed to assess perceptions of genome-wide screening among the US cardiology clinic population.
Assuntos
Cardiologia , Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/terapia , Inquéritos e Questionários , Família , Tomada de Decisão ClínicaRESUMO
BACKGROUND: Cryptosporidium parvum is an apicomplexan parasite commonly found across many host species with a global infection prevalence in human populations of 7.6%. Understanding its diversity and genomic makeup can help in fighting established infections and prohibiting further transmission. The basis of every genomic study is a high-quality reference genome that has continuity and completeness, thus enabling comprehensive comparative studies. FINDINGS: Here, we provide a highly accurate and complete reference genome of Cryptosporidium parvum. The assembly is based on Oxford Nanopore reads and was improved using Illumina reads for error correction. We also outline how to evaluate and choose from different assembly methods based on 2 main approaches that can be applied to other Cryptosporidium species. The assembly encompasses 8 chromosomes and includes 13 telomeres that were resolved. Overall, the assembly shows a high completion rate with 98.4% single-copy BUSCO genes. CONCLUSIONS: This high-quality reference genome of a zoonotic IIaA17G2R1 C. parvum subtype isolate provides the basis for subsequent comparative genomic studies across the Cryptosporidium clade. This will enable improved understanding of diversity, functional, and association studies.
Assuntos
Criptosporidiose , Cryptosporidium parvum , Cryptosporidium , Criptosporidiose/epidemiologia , Criptosporidiose/genética , Criptosporidiose/parasitologia , Cryptosporidium/genética , Cryptosporidium parvum/genética , Genoma , Genômica/métodos , HumanosRESUMO
BACKGROUND: The All of Us Research Program (AoURP, "the program") is an initiative, sponsored by the National Institutes of Health (NIH), that aims to enroll one million people (or more) across the USA. Through repeated engagement of participants, a research resource is being created to enable a variety of future observational and interventional studies. The program has also committed to genomic data generation and returning important health-related information to participants. METHODS: Whole-genome sequencing (WGS), variant calling processes, data interpretation, and return-of-results procedures had to be created and receive an Investigational Device Exemption (IDE) from the United States Food and Drug Administration (FDA). The performance of the entire workflow was assessed through the largest known cross-center, WGS-based, validation activity that was refined iteratively through interactions with the FDA over many months. RESULTS: The accuracy and precision of the WGS process as a device for the return of certain health-related genomic results was determined to be sufficient, and an IDE was granted. CONCLUSIONS: We present here both the process of navigating the IDE application process with the FDA and the results of the validation study as a guide to future projects which may need to follow a similar path. Changes to the program in the future will be covered in supplementary submissions to the IDE and will support additional variant classes, sample types, and any expansion to the reportable regions.
Assuntos
Farmacogenética , Saúde da População , Genômica , Humanos , Estados Unidos , Sequenciamento Completo do Genoma/métodosRESUMO
BACKGROUND: Genomic medicine is poised to improve care for common complex diseases such as epilepsy, but additional clinical informatics and implementation science research is needed for it to become a part of the standard of care. Epilepsy is an exemplary complex neurological disorder for which DNA diagnostics have shown to be advantageous for patient care. OBJECTIVE: We designed the Implementation Science for Genomic Health Translation (INSIGHT) study to leverage the fact that both the clinic and testing laboratory control the development and customization of their respective electronic health records and clinical reporting platforms. Through INSIGHT, we can rapidly prototype and benchmark novel approaches to incorporating clinical genomics into patient care. Of particular interest are clinical decision support tools that take advantage of domain knowledge from clinical genomics and can be rapidly adjusted based on feedback from clinicians. METHODS: Building on previously developed evidence and infrastructure components, our model includes the following: establishment of an intervention-ready genomic knowledge base for patient care, creation of a health informatics platform and linking it to a clinical genomics reporting system, and scaling and evaluation of INSIGHT following established implementation science principles. RESULTS: INSIGHT was approved by the Institutional Review Board at the University of Texas Health Science Center at Houston on May 15, 2020, and is designed as a 2-year proof-of-concept study beginning in December 2021. By design, 120 patients from the Texas Comprehensive Epilepsy Program are to be enrolled to test the INSIGHT workflow. Initial results are expected in the first half of 2023. CONCLUSIONS: INSIGHT's domain-specific, practical but generalizable approach may help catalyze a pathway to accelerate translation of genomic knowledge into impactful interventions in patient care. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/25576.
RESUMO
Chronic obstructive pulmonary disease (COPD), diagnosed by reduced lung function, is a leading cause of morbidity and mortality. We performed whole genome sequence (WGS) analysis of lung function and COPD in a multi-ethnic sample of 11,497 participants from population- and family-based studies, and 8499 individuals from COPD-enriched studies in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program. We identify at genome-wide significance 10 known GWAS loci and 22 distinct, previously unreported loci, including two common variant signals from stratified analysis of African Americans. Four novel common variants within the regions of PIAS1, RGN (two variants) and FTO show evidence of replication in the UK Biobank (European ancestry n ~ 320,000), while colocalization analyses leveraging multi-omic data from GTEx and TOPMed identify potential molecular mechanisms underlying four of the 22 novel loci. Our study demonstrates the value of performing WGS analyses and multi-omic follow-up in cohorts of diverse ancestry.