RESUMEN
Whole-genome sequencing resolves many clinical cases where standard diagnostic methods have failed. However, at least half of these cases remain unresolved after whole-genome sequencing. Structural variants (SVs; genomic variants larger than 50 base pairs) of uncertain significance are the genetic cause of a portion of these unresolved cases. As sequencing methods using long or linked reads become more accessible and SV detection algorithms improve, clinicians and researchers are gaining access to thousands of reliable SVs of unknown disease relevance. Methods to predict the pathogenicity of these SVs are required to realize the full diagnostic potential of long-read sequencing. To address this emerging need, we developed StrVCTVRE to distinguish pathogenic SVs from benign SVs that overlap exons. In a random forest classifier, we integrated features that capture gene importance, coding region, conservation, expression, and exon structure. We found that features such as expression and conservation are important but are absent from SV classification guidelines. We leveraged multiple resources to construct a size-matched training set of rare, putatively benign and pathogenic SVs. StrVCTVRE performs accurately across a wide SV size range on independent test sets, which will allow clinicians and researchers to eliminate about half of SVs from consideration while retaining a 90% sensitivity. We anticipate clinicians and researchers will use StrVCTVRE to prioritize SVs in probands where no SV is immediately compelling, empowering deeper investigation into novel SVs to resolve cases and understand new mechanisms of disease. StrVCTVRE runs rapidly and is publicly available.
Asunto(s)
Algoritmos , Genoma Humano , Variación Estructural del Genoma , Programas Informáticos , Aprendizaje Automático Supervisado , Conjuntos de Datos como Asunto , Exones , Genómica/métodos , Humanos , Curva ROC , Secuenciación Completa del Genoma/estadística & datos numéricosRESUMEN
Genome sequencing is enabling precision medicine-tailoring treatment to the unique constellation of variants in an individual's genome. The impact of recurrent pathogenic variants is often understood, however there is a long tail of rare genetic variants that are uncharacterized. The problem of uncharacterized rare variation is especially acute when it occurs in genes of known clinical importance with functionally consequential variants and associated mechanisms. Variants of uncertain significance (VUSs) in these genes are discovered at a rate that outpaces current ability to classify them with databases of previous cases, experimental evaluation, and computational predictors. Clinicians are thus left without guidance about the significance of variants that may have actionable consequences. Computational prediction of the impact of rare genetic variation is increasingly becoming an important capability. In this paper, we review the technical and ethical challenges of interpreting the function of rare variants in two settings: inborn errors of metabolism in newborns and pharmacogenomics. We propose a framework for a genomic learning healthcare system with an initial focus on early-onset treatable disease in newborns and actionable pharmacogenomics. We argue that (1) a genomic learning healthcare system must allow for continuous collection and assessment of rare variants, (2) emerging machine learning methods will enable algorithms to predict the clinical impact of rare variants on protein function, and (3) ethical considerations must inform the construction and deployment of all rare-variation triage strategies, particularly with respect to health disparities arising from unbalanced ancestry representation.
Asunto(s)
Variación Genética/genética , Genética Médica , Genómica , Aprendizaje Automático , Errores Innatos del Metabolismo/genética , Farmacogenética , Medicina de Precisión , Genoma Humano/genética , Humanos , Recién NacidoRESUMEN
Rapid divergence and subsequent reoccurring patterns of gene flow can complicate our ability to discern phylogenetic relationships among closely related species. To what degree such patterns may differ across the genome can provide an opportunity to extrapolate better how life history constraints may influence species boundaries. By exploring differences between autosomal and Z (or X) chromosomal-derived phylogenetic patterns, we can better identify factors that may limit introgression despite patterns of incomplete lineage sorting among closely related taxa. Here, using a whole-genome resequencing approach coupled with an exhaustive sampling of subspecies within the recently divergent prairie grouse complex (genus: Tympanuchus), including the extinct Heath Hen (T. cupido cupido), we show that their phylogenomic history differs depending on autosomal or Z-chromosome partitioned SNPs. Because the Heath Hen was allopatric relative to the other prairie grouse taxa, its phylogenetic signature should not be influenced by gene flow. In contrast, all the other extant prairie grouse taxa, except Attwater's Prairie-chicken (T. c. attwateri), possess overlapping contemporary geographic distributions and have been known to hybridize. After excluding samples that were likely translocated prairie grouse from the Midwest to the eastern coastal states or their resulting hybrids with mainland Heath Hens, species tree analyses based on autosomal SNPs consistently identified a paraphyletic relationship with regard to the Heath Hen with Lesser Prairie-chicken (T. pallidicinctus) sister to Greater Prairie-chicken (T. c. pinnatus) regardless of genic or intergenic partitions. In contrast, species trees based on the Z-chromosome were consistent with Heath Hen sister to a clade that included its conspecifics, Greater and Attwater's Prairie-chickens (T. c. attwateri). These results were further explained by historic gene flow, as shown with an excess of autosomal SNPs shared between Lesser and Greater Prairie-chickens but not with the Z-chromosome. Phylogenetic placement of Sharp-tailed Grouse (T. phasianellus), however, did not differ among analyses and was sister to a clade that included all other prairie grouse despite low levels of autosomal gene flow with Greater Prairie-chicken. These results, along with strong sexual selection (i.e., male hybrid behavioral isolation) and a lek breeding system (i.e., high variance in male mating success), are consistent with a pattern of female-biased introgression between prairie grouse taxa with overlapping geographic distributions. Additional study is warranted to explore how genomic components associated with the Z-chromosome influence the phenotype and thereby impact species limits among prairie grouse taxa despite ongoing contemporary gene flow.
Asunto(s)
Pollos , Pradera , Animales , Femenino , FilogeniaRESUMEN
Biofilms are surface-associated bacterial communities that are crucial in nature and during infection. Despite extensive work to identify biofilm components and to discover how they are regulated, little is known about biofilm structure at the level of individual cells. Here, we use state-of-the-art microscopy techniques to enable live single-cell resolution imaging of a Vibrio cholerae biofilm as it develops from one single founder cell to a mature biofilm of 10,000 cells, and to discover the forces underpinning the architectural evolution. Mutagenesis, matrix labeling, and simulations demonstrate that surface adhesion-mediated compression causes V. cholerae biofilms to transition from a 2D branched morphology to a dense, ordered 3D cluster. We discover that directional proliferation of rod-shaped bacteria plays a dominant role in shaping the biofilm architecture in V. cholerae biofilms, and this growth pattern is controlled by a single gene, rbmA Competition analyses reveal that the dense growth mode has the advantage of providing the biofilm with superior mechanical properties. Our single-cell technology can broadly link genes to biofilm fine structure and provides a route to assessing cell-to-cell heterogeneity in response to external stimuli.
Asunto(s)
Proteínas Bacterianas/genética , Biopelículas/crecimiento & desarrollo , Análisis de la Célula Individual/métodos , Vibrio cholerae/ultraestructura , Adhesión Bacteriana/genética , Proliferación Celular/genética , Humanos , Vibrio cholerae/genética , Vibrio cholerae/crecimiento & desarrollo , Vibrio cholerae/patogenicidadRESUMEN
BACKGROUND: Curated databases of genetic variants assist clinicians and researchers in interpreting genetic variation. Yet, these databases contain some misclassified variants. It is unclear whether variant misclassification is abating as these databases rapidly grow and implement new guidelines. METHODS: Using archives of ClinVar and HGMD, we investigated how variant misclassification has changed over 6 years, across different ancestry groups. We considered inborn errors of metabolism (IEMs) screened in newborns as a model system because these disorders are often highly penetrant with neonatal phenotypes. We used samples from the 1000 Genomes Project (1KGP) to identify individuals with genotypes that were classified by the databases as pathogenic. Due to the rarity of IEMs, nearly all such classified pathogenic genotypes indicate likely variant misclassification in ClinVar or HGMD. RESULTS: While the false-positive rates of both ClinVar and HGMD have improved over time, HGMD variants currently imply two orders of magnitude more affected individuals in 1KGP than ClinVar variants. We observed that African ancestry individuals have a significantly increased chance of being incorrectly indicated to be affected by a screened IEM when HGMD variants are used. However, this bias affecting genomes of African ancestry was no longer significant once common variants were removed in accordance with recent variant classification guidelines. We discovered that ClinVar variants classified as Pathogenic or Likely Pathogenic are reclassified sixfold more often than DM or DM? variants in HGMD, which has likely resulted in ClinVar's lower false-positive rate. CONCLUSIONS: Considering misclassified variants that have since been reclassified reveals our increasing understanding of rare genetic variation. We found that variant classification guidelines and allele frequency databases comprising genetically diverse samples are important factors in reclassification. We also discovered that ClinVar variants common in European and South Asian individuals were more likely to be reclassified to a lower confidence category, perhaps due to an increased chance of these variants being classified by multiple submitters. We discuss features for variant classification databases that would support their continued improvement.
Asunto(s)
Bases de Datos Genéticas , Variación Genética , Frecuencia de los Genes , Genotipo , GenómicaRESUMEN
Current genetic testenhancer and narrows the diagnostic intervals for rare diseases provide a diagnosis in only a modest proportion of cases. The Full-Genome Analysis method, FGA, combines long-range assembly and whole-genome sequencing to detect small variants, structural variants with breakpoint resolution, and phasing. We built a variant prioritization pipeline and tested FGA's utility for diagnosis of rare diseases in a clinical setting. FGA identified structural variants and small variants with an overall diagnostic yield of 40% (20 of 50 cases) and 35% in exome-negative cases (8 of 23 cases), 4 of these were structural variants. FGA detected and mapped structural variants that are missed by short reads, including non-coding duplication, and phased variants across long distances of more than 180 kb. With the prioritization algorithm, longer DNA technologies could replace multiple tests for monogenic disorders and expand the range of variants detected. Our study suggests that genomes produced from technologies like FGA can improve variant detection and provide higher resolution genome maps for future application.