Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 160
Filtrar
1.
Hum Genomics ; 18(1): 44, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38685113

RESUMEN

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.


Asunto(s)
Enfermedades Raras , Humanos , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Genoma Humano/genética , Variación Genética/genética , Biología Computacional/métodos , Fenotipo
2.
Nat Microbiol ; 9(5): 1382-1392, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38649410

RESUMEN

RNA viruses, like SARS-CoV-2, depend on their RNA-dependent RNA polymerases (RdRp) for replication, which is error prone. Monitoring replication errors is crucial for understanding the virus's evolution. Current methods lack the precision to detect rare de novo RNA mutations, particularly in low-input samples such as those from patients. Here we introduce a targeted accurate RNA consensus sequencing method (tARC-seq) to accurately determine the mutation frequency and types in SARS-CoV-2, both in cell culture and clinical samples. Our findings show an average of 2.68 × 10-5 de novo errors per cycle with a C > T bias that cannot be solely attributed to APOBEC editing. We identified hotspots and cold spots throughout the genome, correlating with high or low GC content, and pinpointed transcription regulatory sites as regions more susceptible to errors. tARC-seq captured template switching events including insertions, deletions and complex mutations. These insights shed light on the genetic diversity generation and evolutionary dynamics of SARS-CoV-2.


Asunto(s)
COVID-19 , Genoma Viral , Mutación , ARN Viral , SARS-CoV-2 , Replicación Viral , SARS-CoV-2/genética , Humanos , Replicación Viral/genética , COVID-19/virología , Genoma Viral/genética , ARN Viral/genética , Análisis de Secuencia de ARN/métodos , Evolución Molecular , Tasa de Mutación
3.
Am J Hum Genet ; 111(3): 487-508, 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38325380

RESUMEN

Pathogenic variants in multiple genes on the X chromosome have been implicated in syndromic and non-syndromic intellectual disability disorders. ZFX on Xp22.11 encodes a transcription factor that has been linked to diverse processes including oncogenesis and development, but germline variants have not been characterized in association with disease. Here, we present clinical and molecular characterization of 18 individuals with germline ZFX variants. Exome or genome sequencing revealed 11 variants in 18 subjects (14 males and 4 females) from 16 unrelated families. Four missense variants were identified in 11 subjects, with seven truncation variants in the remaining individuals. Clinical findings included developmental delay/intellectual disability, behavioral abnormalities, hypotonia, and congenital anomalies. Overlapping and recurrent facial features were identified in all subjects, including thickening and medial broadening of eyebrows, variations in the shape of the face, external eye abnormalities, smooth and/or long philtrum, and ear abnormalities. Hyperparathyroidism was found in four families with missense variants, and enrichment of different tumor types was observed. In molecular studies, DNA-binding domain variants elicited differential expression of a small set of target genes relative to wild-type ZFX in cultured cells, suggesting a gain or loss of transcriptional activity. Additionally, a zebrafish model of ZFX loss displayed an altered behavioral phenotype, providing additional evidence for the functional significance of ZFX. Our clinical and experimental data support that variants in ZFX are associated with an X-linked intellectual disability syndrome characterized by a recurrent facial gestalt, neurocognitive and behavioral abnormalities, and an increased risk for congenital anomalies and hyperparathyroidism.


Asunto(s)
Hiperparatiroidismo , Discapacidad Intelectual , Trastornos del Neurodesarrollo , Masculino , Femenino , Animales , Humanos , Discapacidad Intelectual/patología , Pez Cebra/genética , Mutación Missense/genética , Factores de Transcripción/genética , Fenotipo , Trastornos del Neurodesarrollo/genética
4.
Hum Genet ; 2024 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-38170232

RESUMEN

Variants which disrupt splicing are a frequent cause of rare disease that have been under-ascertained clinically. Accurate and efficient methods to predict a variant's impact on splicing are needed to interpret the growing number of variants of unknown significance (VUS) identified by exome and genome sequencing. Here, we present the results of the CAGI6 Splicing VUS challenge, which invited predictions of the splicing impact of 56 variants ascertained clinically and functionally validated to determine splicing impact. The performance of 12 prediction methods, along with SpliceAI and CADD, was compared on the 56 functionally validated variants. The maximum accuracy achieved was 82% from two different approaches, one weighting SpliceAI scores by minor allele frequency, and one applying the recently published Splicing Prediction Pipeline (SPiP). SPiP performed optimally in terms of sensitivity, while an ensemble method combining multiple prediction tools and information from databases exceeded all others for specificity. Several challenge methods equalled or exceeded the performance of SpliceAI, with ultimate choice of prediction method likely to depend on experimental or clinical aims. One quarter of the variants were incorrectly predicted by at least 50% of the methods, highlighting the need for further improvements to splicing prediction methods for successful clinical application.

5.
Nat Metab ; 5(10): 1673-1684, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37709961

RESUMEN

The glucagon-like peptide 1 receptor (GLP1R) is a major drug target with several agonists being prescribed in individuals with type 2 diabetes and obesity1,2. The impact of genetic variability of GLP1R on receptor function and its association with metabolic traits are unclear with conflicting reports. Here, we show an unexpected diversity of phenotypes ranging from defective cell surface expression to complete or pathway-specific gain of function (GoF) and loss of function (LoF), after performing a functional profiling of 60 GLP1R variants across four signalling pathways. The defective insulin secretion of GLP1R LoF variants is rescued by allosteric GLP1R ligands or high concentrations of exendin-4/semaglutide in INS-1 823/3 cells. Genetic association studies in 200,000 participants from the UK Biobank show that impaired GLP1R cell surface expression contributes to poor glucose control and increased adiposity with increased glycated haemoglobin A1c and body mass index. This study defines impaired GLP1R cell surface expression as a risk factor for traits associated with type 2 diabetes and obesity and provides potential treatment options for GLP1R LoF variant carriers.


Asunto(s)
Glucemia , Diabetes Mellitus Tipo 2 , Humanos , Insulina/metabolismo , Diabetes Mellitus Tipo 2/genética , Adiposidad/genética , Obesidad/genética
6.
Res Sq ; 2023 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-37577579

RESUMEN

In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.

7.
medRxiv ; 2023 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-37577678

RESUMEN

Background: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods: Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions: By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.

8.
J Am Heart Assoc ; 12(17): e029103, 2023 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-37642027

RESUMEN

Background Coronary artery disease is a primary cause of death around the world, with both genetic and environmental risk factors. Although genome-wide association studies have linked >100 unique loci to its genetic basis, these only explain a fraction of disease heritability. Methods and Results To find additional gene drivers of coronary artery disease, we applied machine learning to quantitative evolutionary information on the impact of coding variants in whole exomes from the Myocardial Infarction Genetics Consortium. Using ensemble-based supervised learning, the Evolutionary Action-Machine Learning framework ranked each gene's ability to classify case and control samples and identified 79 significant associations. These were connected to known risk loci; enriched in cardiovascular processes like lipid metabolism, blood clotting, and inflammation; and enriched for cardiovascular phenotypes in knockout mouse models. Among them, INPP5F and MST1R are examples of potentially novel coronary artery disease risk genes that modulate immune signaling in response to cardiac stress. Conclusions We concluded that machine learning on the functional impact of coding variants, based on a massive amount of evolutionary information, has the power to suggest novel coronary artery disease risk genes for mechanistic and therapeutic discoveries in cardiovascular biology, and should also apply in other complex polygenic diseases.


Asunto(s)
Enfermedad de la Arteria Coronaria , Animales , Ratones , Enfermedad de la Arteria Coronaria/genética , Estudio de Asociación del Genoma Completo , Aprendizaje Automático Supervisado , Evolución Biológica , Aprendizaje Automático , Ratones Noqueados
9.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37522889

RESUMEN

SUMMARY: In any population under selective pressure, a central challenge is to distinguish the genes that drive adaptation from others which, subject to population variation, harbor many neutral mutations de novo. We recently showed that such genes could be identified by supplementing information on mutational frequency with an evolutionary analysis of the likely functional impact of coding variants. This approach improved the discovery of driver genes in both lab-evolved and environmental Escherichia coli strains. To facilitate general adoption, we now developed ShinyBioHEAT, an R Shiny web-based application that enables identification of phenotype driving gene in two commonly used model bacteria, E.coli and Bacillus subtilis, with no specific computational skill requirements. ShinyBioHEAT not only supports transparent and interactive analysis of lab evolution data in E.coli and B.subtilis, but it also creates dynamic visualizations of mutational impact on protein structures, which add orthogonal checks on predicted drivers. AVAILABILITY AND IMPLEMENTATION: Code for ShinyBioHEAT is available at https://github.com/LichtargeLab/ShinyBioHEAT. The Shiny application is additionally hosted at http://bioheat.lichtargelab.org/.


Asunto(s)
Escherichia coli , Aplicaciones Móviles , Escherichia coli/genética , Programas Informáticos , Mutación , Interpretación Estadística de Datos , Tasa de Mutación
10.
J Biol Chem ; 299(7): 104896, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37290531

RESUMEN

Measuring the relative effect that any two sequence positions have on each other may improve protein design or help better interpret coding variants. Current approaches use statistics and machine learning but rarely consider phylogenetic divergences which, as shown by Evolutionary Trace studies, provide insight into the functional impact of sequence perturbations. Here, we reframe covariation analyses in the Evolutionary Trace framework to measure the relative tolerance to perturbation of each residue pair during evolution. This approach (CovET) systematically accounts for phylogenetic divergences: at each divergence event, we penalize covariation patterns that belie evolutionary coupling. We find that while CovET approximates the performance of existing methods to predict individual structural contacts, it performs significantly better at finding structural clusters of coupled residues and ligand binding sites. For example, CovET found more functionally critical residues when we examined the RNA recognition motif and WW domains. It correlates better with large-scale epistasis screen data. In the dopamine D2 receptor, top CovET residue pairs recovered accurately the allosteric activation pathway characterized for Class A G protein-coupled receptors. These data suggest that CovET ranks highest the sequence position pairs that play critical functional roles through epistatic and allosteric interactions in evolutionarily relevant structure-function motifs. CovET complements current methods and may shed light on fundamental molecular mechanisms of protein structure and function.


Asunto(s)
Evolución Molecular , Alineación de Secuencia , Sitios de Unión/genética , Filogenia , Receptores Acoplados a Proteínas G/genética , Alineación de Secuencia/métodos
11.
Nat Commun ; 14(1): 2765, 2023 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-37179358

RESUMEN

The incidence of Alzheimer's Disease in females is almost double that of males. To search for sex-specific gene associations, we build a machine learning approach focused on functionally impactful coding variants. This method can detect differences between sequenced cases and controls in small cohorts. In the Alzheimer's Disease Sequencing Project with mixed sexes, this approach identified genes enriched for immune response pathways. After sex-separation, genes become specifically enriched for stress-response pathways in male and cell-cycle pathways in female. These genes improve disease risk prediction in silico and modulate Drosophila neurodegeneration in vivo. Thus, a general approach for machine learning on functionally impactful variants can uncover sex-specific candidates towards diagnostic biomarkers and therapeutic targets.


Asunto(s)
Enfermedad de Alzheimer , Factores Sexuales , Femenino , Humanos , Masculino , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo
12.
Br J Cancer ; 128(11): 2013-2024, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37012319

RESUMEN

BACKGROUND: Cisplatin (CDDP) is a mainstay treatment for advanced head and neck squamous cell carcinomas (HNSCC) despite a high frequency of innate and acquired resistance. We hypothesised that tumours acquire CDDP resistance through an enhanced reductive state dependent on metabolic rewiring. METHODS: To validate this model and understand how an adaptive metabolic programme might be imprinted, we performed an integrated analysis of CDDP-resistant HNSCC clones from multiple genomic backgrounds by whole-exome sequencing, RNA-seq, mass spectrometry, steady state and flux metabolomics. RESULTS: Inactivating KEAP1 mutations or reductions in KEAP1 RNA correlated with Nrf2 activation in CDDP-resistant cells, which functionally contributed to resistance. Proteomics identified elevation of downstream Nrf2 targets and the enrichment of enzymes involved in generation of biomass and reducing equivalents, metabolism of glucose, glutathione, NAD(P), and oxoacids. This was accompanied by biochemical and metabolic evidence of an enhanced reductive state dependent on coordinated glucose and glutamine catabolism, associated with reduced energy production and proliferation, despite normal mitochondrial structure and function. CONCLUSIONS: Our analysis identified coordinated metabolic changes associated with CDDP resistance that may provide new therapeutic avenues through targeting of these convergent pathways.


Asunto(s)
Antineoplásicos , Neoplasias de Cabeza y Cuello , Humanos , Cisplatino/metabolismo , Carcinoma de Células Escamosas de Cabeza y Cuello , Proteína 1 Asociada A ECH Tipo Kelch/genética , Factor 2 Relacionado con NF-E2/genética , Resistencia a Antineoplásicos/genética , Línea Celular Tumoral , Glucosa , Antineoplásicos/farmacología
13.
J Biol Chem ; 299(4): 103030, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36806686

RESUMEN

Upon ligand binding to a G protein-coupled receptor, extracellular signals are transmitted into a cell through sets of residue interactions that translate ligand binding into structural rearrangements. These interactions needed for functions impose evolutionary constraints so that, on occasion, mutations in one position may be compensated by other mutations at functionally coupled positions. To quantify the impact of amino acid substitutions in the context of major evolutionary divergence in the G protein-coupled receptor subfamily of metabotropic glutamate receptors (mGluRs), we combined two phylogenetic-based algorithms, Evolutionary Trace and covariation Evolutionary Trace, to infer potential structure-function couplings and roles in mGluRs. We found a subset of evolutionarily important residues at known functional sites and evidence of coupling among distinct structural clusters in mGluR. In addition, experimental mutagenesis and functional assays confirmed that some highly covariant residues are coupled, revealing their synergy. Collectively, these findings inform a critical step toward understanding the molecular and structural basis of amino acid variation patterns within mGluRs and provide insight for drug development, protein engineering, and analysis of naturally occurring variants.


Asunto(s)
Receptores de Glutamato Metabotrópico , Receptores de Glutamato Metabotrópico/genética , Receptores de Glutamato Metabotrópico/metabolismo , Sitios de Unión , Filogenia , Ligandos , Receptores Acoplados a Proteínas G/genética
14.
Adv Radiat Oncol ; 7(6): 100989, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36420184

RESUMEN

Purpose: An evolutionary action scoring algorithm (EAp53) based on phylogenetic sequence variations stratifies patients with head and neck squamous cell carcinoma (HNSCC) bearing TP53 missense mutations as high-risk, associated with poor outcomes, or low-risk, with similar outcomes as TP53 wild-type, and has been validated as a reliable prognostic marker. We performed this study to further validate prior findings demonstrating that EAp53 is a prognostic marker for patients with locally advanced HNSCC and explored its predictive value for treatment outcomes to adjuvant bio-chemoradiotherapy. Methods and Materials: Eighty-one resection samples from patients treated surgically for stage III or IV human papillomavirus-negative HNSCC with high-risk pathologic features, who received either radiation therapy + cetuximab + cisplatin (cisplatin) or radiation therapy + cetuximab + docetaxel (docetaxel) as adjuvant treatment in a phase 2 study were subjected to TP53 targeted sequencing and EAp53 scoring to correlate with clinical outcomes. Due to the limited sample size, patients were combined into 2 EAp53 groups: (1) wild-type or low-risk; and (2) high-risk or other. Results: At a median follow-up of 9.8 years, there was a significant interaction between EAp53 group and treatment for overall survival (P = .008), disease-free survival (P = .05), and distant metastasis (DM; P = .004). In wild-type or low-risk group, the docetaxel arm showed significantly better overall survival (hazard ratio [HR] 0.11, [0.03-0.36]), disease-free survival (HR 0.24, [0.09-0.61]), and less DM (HR 0.04, [0.01-0.31]) than the cisplatin arm. In high-risk or other group, differences between treatments were not statistically significant. Conclusions: The docetaxel arm was associated with better survival than the cisplatin arm for patients with wild-type or low-risk EAp53. These benefits appear to be largely driven by a reduction in DM.

15.
Cell Genom ; 2(9)2022 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-36268052

RESUMEN

Most disease-gene association methods do not account for gene-gene interactions, even though these play a crucial role in complex, polygenic diseases like Alzheimer's disease (AD). To discover new genes whose interactions may contribute to pathology, we introduce GeneEMBED. This approach compares the functional perturbations induced in gene interaction network neighborhoods by coding variants from disease versus healthy subjects. In two independent AD cohorts of 5,169 exomes and 969 genomes, GeneEMBED identified novel candidates. These genes were differentially expressed in post mortem AD brains and modulated neurological phenotypes in mice. Four that were differentially overexpressed and modified neurodegeneration in vivo are PLEC, UTRN, TP53, and POLD1. Notably, TP53 and POLD1 are involved in DNA break repair and inhibited by approved drugs. While these data show proof of concept in AD, GeneEMBED is a general approach that should be broadly applicable to identify genes relevant to risk mechanisms and therapy of other complex diseases.

16.
ACS Pharmacol Transl Sci ; 5(2): 89-101, 2022 Feb 11.
Artículo en Inglés | MEDLINE | ID: mdl-35846981

RESUMEN

G protein-coupled receptors (GPCRs) can engage distinct subsets of signaling pathways, but the structural determinants of this functional selectivity remain elusive. The naturally occurring genetic variants of GPCRs, selectively affecting different pathways, offer an opportunity to explore this phenomenon. We previously identified 40 coding variants of the MTNR1B gene encoding the melatonin MT2 receptor (MT2). These mutations differently impact the ß-arrestin 2 recruitment, ERK activation, cAMP production, and Gαi1 and Gαz activation. In this study, we combined functional clustering and structural modeling to delineate the molecular features controlling the MT2 functional selectivity. Using non-negative matrix factorization, we analyzed the signaling signatures of the 40 MT2 variants yielding eight clusters defined by unique signaling features and localized in distinct domains of MT2. Using computational homology modeling, we describe how specific mutations can selectively affect the subsets of signaling pathways and offer a proof of principle that natural variants can be used to explore and understand the GPCR functional selectivity.

17.
Res Sq ; 2022 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-35677076

RESUMEN

Both the SARS-CoV-2 virus and its mRNA vaccines depend on RNA polymerases (RNAP)1,2; however, these enzymes are inherently error-prone and can introduce variants into the RNA3. To understand SARS-CoV-2 evolution and vaccine efficacy, it is critical to identify the extent and distribution of errors introduced by the RNAPs involved in each process. Current methods lack the sensitivity and specificity to measure de novo RNA variants in low input samples like viral isolates3. Here, we determine the frequency and nature of RNA errors in both SARS-CoV-2 and its vaccine using a targeted Accurate RNA Consensus sequencing method (tARC-seq). We found that the viral RNA-dependent RNAP (RdRp) makes ~1 error every 10,000 nucleotides - higher than previous estimates4. We also observed that RNA variants are not randomly distributed across the genome but are associated with certain genomic features and genes, such as S (Spike). tARC-seq captured a number of large insertions, deletions and complex mutations that can be modeled through non-programmed RdRp template switching. This template switching feature of RdRp explains many key genetic changes observed during the evolution of different lineages worldwide, including Omicron. Further sequencing of the Pfizer-BioNTech COVID-19 vaccine revealed an RNA variant frequency of ~1 in 5,000, meaning most of the vaccine transcripts produced in vitro by T7 phage RNAP harbor a variant. These results demonstrate the extraordinary genetic diversity of viral populations and the heterogeneous nature of an mRNA vaccine fueled by RNAP inaccuracy. Along with functional studies and pandemic data, tARC-seq variant spectra can inform models to predict how SARS-CoV-2 may evolve. Finally, our results may help improve future vaccine development and study design as mRNA therapies continue to gain traction.

18.
Nat Commun ; 13(1): 3189, 2022 06 09.
Artículo en Inglés | MEDLINE | ID: mdl-35680894

RESUMEN

Since antibiotic development lags, we search for potential drug targets through directed evolution experiments. A challenge is that many resistance genes hide in a noisy mutational background as mutator clones emerge in the adaptive population. Here, to overcome this noise, we quantify the impact of mutations through evolutionary action (EA). After sequencing ciprofloxacin or colistin resistance strains grown under different mutational regimes, we find that an elevated sum of the evolutionary action of mutations in a gene identifies known resistance drivers. This EA integration approach also suggests new antibiotic resistance genes which are then shown to provide a fitness advantage in competition experiments. Moreover, EA integration analysis of clinical and environmental isolates of antibiotic resistant of E. coli identifies gene drivers of resistance where a standard approach fails. Together these results inform the genetic basis of de novo colistin resistance and support the robust discovery of phenotype-driving genes via the evolutionary action of genetic perturbations in fitness landscapes.


Asunto(s)
Antibacterianos , Farmacorresistencia Bacteriana , Proteínas de Escherichia coli , Escherichia coli , Antibacterianos/farmacología , Ciprofloxacina/farmacología , Colistina/farmacología , Farmacorresistencia Bacteriana/genética , Escherichia coli/efectos de los fármacos , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Pruebas de Sensibilidad Microbiana , Mutación
19.
Nucleic Acids Res ; 50(12): e70, 2022 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-35412634

RESUMEN

Discovering rare cancer driver genes is difficult because their mutational frequency is too low for statistical detection by computational methods. EPIMUTESTR is an integrative nearest-neighbor machine learning algorithm that identifies such marginal genes by modeling the fitness of their mutations with the phylogenetic Evolutionary Action (EA) score. Over cohorts of sequenced patients from The Cancer Genome Atlas representing 33 tumor types, EPIMUTESTR detected 214 previously inferred cancer driver genes and 137 new candidates never identified computationally before of which seven genes are supported in the COSMIC Cancer Gene Census. EPIMUTESTR achieved better robustness and specificity than existing methods in a number of benchmark methods and datasets.


Asunto(s)
Aprendizaje Automático , Neoplasias , Humanos , Mutación , Neoplasias/genética , Neoplasias/patología , Oncogenes , Filogenia
20.
Hum Genet ; 141(10): 1549-1577, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35488922

RESUMEN

Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.


Asunto(s)
Pruebas Genéticas , Genoma Humano , Biología Computacional , Variación Genética , Humanos , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA