RESUMEN
Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.
Asunto(s)
Complejos Multiproteicos/análisis , Mapas de Interacción de Proteínas , Proteínas/química , Proteómica/métodos , Humanos , Espectrometría de Masas en TándemRESUMEN
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Asunto(s)
Algoritmos , Aprendizaje AutomáticoRESUMEN
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.
Asunto(s)
COVID-19 , Humanos , Algoritmos , Proyectos de Investigación , Sesgo , ProbabilidadRESUMEN
BACKGROUND: Genome-wide ligation-based assays such as Hi-C provide us with an unprecedented opportunity to investigate the spatial organization of the genome. Results of a typical Hi-C experiment are often summarized in a chromosomal contact map, a matrix whose elements reflect the co-location frequencies of genomic loci. To elucidate the complex structural and functional interactions between those genomic loci, networks offer a natural and powerful framework. RESULTS: We propose a novel graph-theoretical framework, the Corrected Gene Proximity (CGP) map to study the effect of the 3D spatial organization of genes in transcriptional regulation. The starting point of the CGP map is a weighted network, the gene proximity map, whose weights are based on the contact frequencies between genes extracted from genome-wide Hi-C data. We derive a null model for the network based on the signal contributed by the 1D genomic distance and use it to "correct" the gene proximity for cell type 3D specific arrangements. The CGP map, therefore, provides a network framework for the 3D structure of the genome on a global scale. On human cell lines, we show that the CGP map can detect and quantify gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies. Analyzing the expression pattern of metabolic pathways of two hematopoietic cell lines, we find that the relative positioning of the genes, as captured and quantified by the CGP, is highly correlated with their expression change. We further show that the CGP map can be used to form an inter-chromosomal proximity map that allows large-scale abnormalities, such as chromosomal translocations, to be identified. CONCLUSIONS: The Corrected Gene Proximity map is a map of the 3D structure of the genome on a global scale. It allows the simultaneous analysis of intra- and inter- chromosomal interactions and of gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies, thus revealing hidden associations between global spatial positioning and gene expression. The flexible graph-based formalism of the CGP map can be easily generalized to study any existing Hi-C datasets.
Asunto(s)
Cromosomas Humanos , Regulación de la Expresión Génica , Genoma Humano , Línea Celular , Genómica/métodos , Humanos , Redes y Vías Metabólicas/genéticaRESUMEN
Network medicine approaches have been largely successful at increasing our knowledge of molecularly characterized diseases. Given a set of disease genes associated with a disease, neighbourhood-based methods and random walkers exploit the interactome allowing the prediction of further genes for that disease. In general, however, diseases with no known molecular basis constitute a challenge. Here we present a novel network approach to prioritize gene-disease associations that is able to also predict genes for diseases with no known molecular basis. Our method, which we have called Cardigan (ChARting DIsease Gene AssociatioNs), uses semi-supervised learning and exploits a measure of similarity between disease phenotypes. We evaluated its performance at predicting genes for both molecularly characterized and uncharacterized diseases in OMIM, using both weighted and binary interactomes, and compared it with state-of-the-art methods. Our tests, which use datasets collected at different points in time to replicate the dynamics of the disease gene discovery process, prove that Cardigan is able to accurately predict disease genes for molecularly uncharacterized diseases. Additionally, standard leave-one-out cross validation tests show how our approach outperforms state-of-the-art methods at predicting genes for molecularly characterized diseases by 14%-65%. Cardigan can also be used for disease module prediction, where it outperforms state-of-the-art methods by 87%-299%.
Asunto(s)
Enfermedades Genéticas Congénitas/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Enfermedades Genéticas Congénitas/diagnóstico , Humanos , FenotipoRESUMEN
A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from nonfunctional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D Web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D Web interface is freely available with all major browsers supported.
Asunto(s)
Sustitución de Aminoácidos , Neoplasias/genética , Proteoma/genética , Navegador Web , Algoritmos , Análisis por Conglomerados , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Estructura Terciaria de Proteína , Proteoma/químicaRESUMEN
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
Asunto(s)
Biología Computacional/métodos , Biología Molecular/métodos , Anotación de Secuencia Molecular , Proteínas/fisiología , Algoritmos , Animales , Bases de Datos de Proteínas , Exorribonucleasas/clasificación , Exorribonucleasas/genética , Exorribonucleasas/fisiología , Predicción , Humanos , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Especificidad de la EspecieRESUMEN
We introduce clustering with overlapping neighborhood expansion (ClusterONE), a method for detecting potentially overlapping protein complexes from protein-protein interaction data. ClusterONE-derived complexes for several yeast data sets showed better correspondence with reference complexes in the Munich Information Center for Protein Sequence (MIPS) catalog and complexes derived from the Saccharomyces Genome Database (SGD) than the results of seven popular methods. The results also showed a high extent of functional homogeneity.
Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Algoritmos , Análisis por ConglomeradosRESUMEN
SUMMARY: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. CONTACT: alberto@cs.rhul.ac.uk AVAILABILITY: GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines.
Asunto(s)
Minería de Datos/métodos , Ontología de Genes , Internet , Semántica , Programas Informáticos , Proteínas/genética , Vocabulario ControladoRESUMEN
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Asunto(s)
Anotación de Secuencia Molecular , Familia de Multigenes , Estructura Terciaria de Proteína , Proteínas/clasificación , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Animales , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/clasificación , Proteínas de Arabidopsis/genética , Secuencia de Consenso , Genómica/métodos , Ratones , Estructura Terciaria de Proteína/genética , Proteínas/química , Proteínas/genética , Análisis de Secuencia de ProteínaRESUMEN
From a network medicine perspective, a disease is the consequence of perturbations on the interactome. These perturbations tend to appear in a specific neighbourhood on the interactome, the disease module, and modules related to phenotypically similar diseases tend to be located in close-by regions. We present LanDis, a freely available web-based interactive tool ( https://paccanarolab.org/landis ) that allows domain experts, medical doctors and the larger scientific community to graphically navigate the interactome distances between the modules of over 44 million pairs of heritable diseases. The map-like interface provides detailed comparisons between pairs of diseases together with supporting evidence. Every disease in LanDis is linked to relevant entries in OMIM and UniProt, providing a starting point for in-depth analysis and an opportunity for novel insight into the aetiology of diseases as well as differential diagnosis.
RESUMEN
MOTIVATION: Several measures have been recently proposed for quantifying the functional similarity between gene products according to well-structured controlled vocabularies where biological terms are organized in a tree or in a directed acyclic graph (DAG) structure. However, existing semantic similarity measures ignore two important facts. First, when calculating the similarity between two terms, they disregard the descendants of these terms. While this makes no difference when the ontology is a tree, we shall show that it has important consequences when the ontology is a DAG-this is the case, for example, with the Gene Ontology (GO). Second, existing similarity measures do not model the inherent uncertainty which comes from the fact that our current knowledge of the gene annotation and of the ontology structure is incomplete. Here, we propose a novel approach based on downward random walks that can be used to improve any of the existing similarity measures to exhibit these two properties. The approach is computationally efficient-random walks do not need to be simulated as we provide formulas to calculate their stationary distributions. RESULTS: To show that our approach can potentially improve any semantic similarity measure, we test it on six different semantic similarity measures: three commonly used measures by Resnik (1999), Lin (1998), and Jiang and Conrath (1997); and three recently proposed measures: simUI, simGIC by Pesquita et al. (2008); GraSM by Couto et al. (2007); and Couto and Silva (2011). We applied these improved measures to the GO annotations of the yeast Saccharomyces cerevisiae, and tested how they correlate with sequence similarity, mRNA co-expression and protein-protein interaction data. Our results consistently show that the use of downward random walks leads to more reliable similarity measures.
Asunto(s)
Algoritmos , Semántica , Vocabulario Controlado , Anotación de Secuencia Molecular , Complejos Multiproteicos/genética , Proteínas/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Terminología como Asunto , IncertidumbreRESUMEN
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a "systems-wide" functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
Asunto(s)
Proteínas de Escherichia coli/genética , Escherichia coli/genética , Genoma Bacteriano , Proteoma/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Complejos Multiproteicos/genética , Mapeo de Interacción de Proteínas/métodosRESUMEN
Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on individual proteins as well as functional genomics and systems biology.
Asunto(s)
Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Evolución Biológica , Secuencia Conservada , Espectrometría de Masas , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismo , Unión Proteica , Proteoma/química , Proteómica , Proteínas de Saccharomyces cerevisiae/químicaRESUMEN
Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats-i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.
Asunto(s)
Genómica , Microbiología , Aminoácidos/biosíntesis , Técnicas Biosensibles , Metabolismo de los Lípidos , Polisacáridos/metabolismoRESUMEN
Early and accurate detection of side effects is critical for the clinical success of drugs under development. Here, we aim to predict unknown side effects for drugs with a small number of side effects identified in randomized controlled clinical trials. Our machine learning framework, the geometric self-expressive model (GSEM), learns globally optimal self-representations for drugs and side effects from pharmacological graph networks. We show the usefulness of the GSEM on 505 therapeutically diverse drugs and 904 side effects from multiple human physiological systems. Here, we also show a data integration strategy that could be adopted to improve the ability of side effect prediction models to identify unknown side effects that might only appear after the drug enters the market.
Asunto(s)
Biología Computacional , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/diagnóstico , Aprendizaje Automático , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
We present two machine learning approaches for drug repurposing. While we have developed them for COVID-19, they are disease-agnostic. The two methodologies are complementary, targeting SARS-CoV-2 and host factors, respectively. Our first approach consists of a matrix factorization algorithm to rank broad-spectrum antivirals. Our second approach, based on network medicine, uses graph kernels to rank drugs according to the perturbation they induce on a subnetwork of the human interactome that is crucial for SARS-CoV-2 infection/replication. Our experiments show that our top predicted broad-spectrum antivirals include drugs indicated for compassionate use in COVID-19 patients; and that the ranking obtained by our kernel-based approach aligns with experimental data. Finally, we present the COVID-19 repositioning explorer (CoREx), an interactive online tool to explore the interplay between drugs and SARS-CoV-2 host proteins in the context of biological networks, protein function, drug clinical use, and Connectivity Map. CoREx is freely available at: https://paccanarolab.org/corex/.
RESUMEN
Despite the development of specific therapies against severe acute respiratory coronavirus 2 (SARS-CoV-2), the continuous investigation of the mechanism of action of clinically approved drugs could provide new information on the druggable steps of virus-host interaction. For example, chloroquine (CQ)/hydroxychloroquine (HCQ) lacks in vitro activity against SARS-CoV-2 in TMPRSS2-expressing cells, such as human pneumocyte cell line Calu-3, and likewise, failed to show clinical benefit in the Solidarity and Recovery clinical trials. Another antimalarial drug, mefloquine, which is not a 4-aminoquinoline like CQ/HCQ, has emerged as a potential anti-SARS-CoV-2 antiviral in vitro and has also been previously repurposed for respiratory diseases. Here, we investigated the anti-SARS-CoV-2 mechanism of action of mefloquine in cells relevant for the physiopathology of COVID-19, such as Calu-3 cells (that recapitulate type II pneumocytes) and monocytes. Molecular pathways modulated by mefloquine were assessed by differential expression analysis, and confirmed by biological assays. A PBPK model was developed to assess mefloquine's optimal doses for achieving therapeutic concentrations. Mefloquine inhibited SARS-CoV-2 replication in Calu-3, with an EC50 of 1.2 µM and EC90 of 5.3 µM. It reduced SARS-CoV-2 RNA levels in monocytes and prevented virus-induced enhancement of IL-6 and TNF-α. Mefloquine reduced SARS-CoV-2 entry and synergized with Remdesivir. Mefloquine's pharmacological parameters are consistent with its plasma exposure in humans and its tissue-to-plasma predicted coefficient points suggesting that mefloquine may accumulate in the lungs. Altogether, our data indicate that mefloquine's chemical structure could represent an orally available host-acting agent to inhibit virus entry.
Asunto(s)
Células Epiteliales Alveolares/efectos de los fármacos , Antivirales/farmacología , Cloroquina/farmacología , Mefloquina/farmacología , SARS-CoV-2/efectos de los fármacos , Adenosina Monofosfato/análogos & derivados , Adenosina Monofosfato/farmacología , Alanina/análogos & derivados , Alanina/farmacología , Células Epiteliales Alveolares/virología , Línea Celular , Reposicionamiento de Medicamentos/métodos , Humanos , Serina Endopeptidasas/genética , Internalización del Virus/efectos de los fármacos , Tratamiento Farmacológico de COVID-19RESUMEN
MicroRNAs (miRNAs) are small non-coding RNAs involved in post-transcriptional gene regulation that have a major impact on many diseases and provides an exciting avenue towards antiviral therapeutics. From patient transcriptomic data, we have discovered a circulating miRNA, miR-2392, that is directly involved with SARS-CoV-2 machinery during host infection. Specifically, we show that miR-2392 is key in driving downstream suppression of mitochondrial gene expression, increasing inflammation, glycolysis, and hypoxia as well as promoting many symptoms associated with COVID-19 infection. We demonstrate miR-2392 is present in the blood and urine of COVID-19 positive patients, but not detected in COVID-19 negative patients. These findings indicate the potential for developing a novel, minimally invasive, COVID-19 detection method. Lastly, using in vitro human and in vivo hamster models, we have developed a novel miRNA-based antiviral therapeutic that targets miR-2392, significantly reduces SARS-CoV-2 viability in hamsters and may potentially inhibit a COVID-19 disease state in humans.
RESUMEN
MicroRNAs (miRNAs) are small non-coding RNAs involved in post-transcriptional gene regulation that have a major impact on many diseases and provide an exciting avenue toward antiviral therapeutics. From patient transcriptomic data, we determined that a circulating miRNA, miR-2392, is directly involved with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) machinery during host infection. Specifically, we show that miR-2392 is key in driving downstream suppression of mitochondrial gene expression, increasing inflammation, glycolysis, and hypoxia, as well as promoting many symptoms associated with coronavirus disease 2019 (COVID-19) infection. We demonstrate that miR-2392 is present in the blood and urine of patients positive for COVID-19 but is not present in patients negative for COVID-19. These findings indicate the potential for developing a minimally invasive COVID-19 detection method. Lastly, using in vitro human and in vivo hamster models, we design a miRNA-based antiviral therapeutic that targets miR-2392, significantly reduces SARS-CoV-2 viability in hamsters, and may potentially inhibit a COVID-19 disease state in humans.