RESUMO
Splice-switching oligonucleotides (SSOs) are antisense compounds that act directly on pre-mRNA to modulate alternative splicing (AS). This study demonstrates the value that artificial intelligence/machine learning (AI/ML) provides for the identification of functional, verifiable, and therapeutic SSOs. We trained XGboost tree models using splicing factor (SF) pre-mRNA binding profiles and spliceosome assembly information to identify modulatory SSO binding sites on pre-mRNA. Using Shapley and out-of-bag analyses we also predicted the identity of specific SFs whose binding to pre-mRNA is blocked by SSOs. This step adds considerable transparency to AI/ML-driven drug discovery and informs biological insights useful in further validation steps. We applied this approach to previously established functional SSOs to retrospectively identify the SFs likely to regulate those events. We then took a prospective validation approach using a novel target in triple negative breast cancer (TNBC), NEDD4L exon 13 (NEDD4Le13). Targeting NEDD4Le13 with an AI/ML-designed SSO decreased the proliferative and migratory behavior of TNBC cells via downregulation of the TGFß pathway. Overall, this study illustrates the ability of AI/ML to extract actionable insights from RNA-seq data.
Assuntos
Processamento Alternativo , Inteligência Artificial , Aprendizado de Máquina , Neoplasias de Mama Triplo Negativas , Humanos , Neoplasias de Mama Triplo Negativas/genética , Linhagem Celular Tumoral , Ubiquitina-Proteína Ligases Nedd4/genética , Ubiquitina-Proteína Ligases Nedd4/metabolismo , Precursores de RNA/genética , Precursores de RNA/metabolismo , Proliferação de Células/efeitos dos fármacos , Proliferação de Células/genética , Fatores de Processamento de RNA/genética , Fatores de Processamento de RNA/metabolismo , Oligonucleotídeos Antissenso/genética , Movimento Celular/genética , Spliceossomos/metabolismo , Spliceossomos/genética , Oligonucleotídeos/genética , FemininoRESUMO
Population genomics of prokaryotes has been studied in depth in only a small number of primarily pathogenic bacteria, as genome sequences of isolates of diverse origin are lacking for most species. Here, we conducted a large-scale survey of population structure in prevalent human gut microbial species, sampled from their natural environment, with a culture-independent metagenomic approach. We examined the variation landscape of 71 species in 2,144 human fecal metagenomes and found that in 44 of these, accounting for 72% of the total assigned microbial abundance, single-nucleotide variation clearly indicates the existence of sub-populations (here termed subspecies). A single subspecies (per species) usually dominates within each host, as expected from ecological theory. At the global scale, geographic distributions of subspecies differ between phyla, with Firmicutes subspecies being significantly more geographically restricted. To investigate the functional significance of the delineated subspecies, we identified genes that consistently distinguish them in a manner that is independent of reference genomes. We further associated these subspecies-specific genes with properties of the microbial community and the host. For example, two of the three Eubacterium rectale subspecies consistently harbor an accessory pro-inflammatory flagellum operon that is associated with lower gut community diversity, higher host BMI, and higher blood fasting insulin levels. Using an additional 676 human oral samples, we further demonstrate the existence of niche specialized subspecies in the different parts of the oral cavity. Taken together, we provide evidence for subspecies in the majority of abundant gut prokaryotes, leading to a better functional and ecological understanding of the human gut microbiome in conjunction with its host.
Assuntos
Microbioma Gastrointestinal , Microbiota , Escherichia coli/fisiologia , Microbioma Gastrointestinal/genética , Genes Bacterianos , Humanos , Microbiota/genética , Fenótipo , Filogeografia , Especificidade da EspécieRESUMO
BACKGROUND: Automation has been introduced into variant interpretation, but it is not known how automated variant interpretation performs on a stand-alone basis. The purpose of this study was to evaluate a fully automated computerized approach. METHOD: We reviewed all variants encountered in a set of carrier screening panels over a 1-year interval. Observed variants with high-confidence ClinVar interpretations were included in the analysis; those without high-confidence ClinVar entries were excluded. RESULTS: Discrepancy rates between automated interpretations and high-confidence ClinVar entries were analyzed. Of the variants interpreted as positive (likely pathogenic or pathogenic) based on ClinVar information, 22.6% were classified as negative (variants of uncertain significance, likely benign or benign) variants by the automated method. Of the ClinVar negative variants, 1.7% were classified as positive by the automated software. On a per-case basis, which accounts for variant frequency, 63.4% of cases with a ClinVar high-confidence positive variant were classified as negative by the automated method. CONCLUSION: While automation in genetic variant interpretation holds promise, there is still a need for manual review of the output. Additional validation of automated variant interpretation methods should be conducted.
Assuntos
Bases de Dados Genéticas , Variação Genética , Humanos , SoftwareRESUMO
We present metaSNV, a tool for single nucleotide variant (SNV) analysis in metagenomic samples, capable of comparing populations of thousands of bacterial and archaeal species. The tool uses as input nucleotide sequence alignments to reference genomes in standard SAM/BAM format, performs SNV calling for individual samples and across the whole data set, and generates various statistics for individual species including allele frequencies and nucleotide diversity per sample as well as distances and fixation indices across samples. Using published data from 676 metagenomic samples of different sites in the oral cavity, we show that the results of metaSNV are comparable to those of MIDAS, an alternative implementation for metagenomic SNV analysis, while data processing is faster and has a smaller storage footprint. Moreover, we implement a set of distance measures that allow the comparison of genomic variation across metagenomic samples and delineate sample-specific variants to enable the tracking of specific strain populations over time. The implementation of metaSNV is available at: http://metasnv.embl.de/.