Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38822995

RESUMO

PURPOSE OF REVIEW: This review aims to explore the interface between artificial intelligence (AI) and chronic pain, seeking to identify areas of focus for enhancing current treatments and yielding novel therapies. RECENT FINDINGS: In the United States, the prevalence of chronic pain is estimated to be upwards of 40%. Its impact extends to increased healthcare costs, reduced economic productivity, and strain on healthcare resources. Addressing this condition is particularly challenging due to its complexity and the significant variability in how patients respond to treatment. Current options often struggle to provide long-term relief, with their benefits rarely outweighing the risks, such as dependency or other side effects. Currently, AI has impacted four key areas of chronic pain treatment and research: (1) predicting outcomes based on clinical information; (2) extracting features from text, specifically clinical notes; (3) modeling 'omic data to identify meaningful patient subgroups with potential for personalized treatments and improved understanding of disease processes; and (4) disentangling complex neuronal signals responsible for pain, which current therapies attempt to modulate. As AI advances, leveraging state-of-the-art architectures will be essential for improving chronic pain treatment. Current efforts aim to extract meaningful representations from complex data, paving the way for personalized medicine. The identification of unique patient subgroups should reveal targets for tailored chronic pain treatments. Moreover, enhancing current treatment approaches is achievable by gaining a more profound understanding of patient physiology and responses. This can be realized by leveraging AI on the increasing volume of data linked to chronic pain.

2.
J Chem Inf Model ; 64(9): 3826-3840, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38696451

RESUMO

Recent advances in computational methods provide the promise of dramatically accelerating drug discovery. While mathematical modeling and machine learning have become vital in predicting drug-target interactions and properties, there is untapped potential in computational drug discovery due to the vast and complex chemical space. This paper builds on our recently published computational fragment-based drug discovery (FBDD) method called fragment databases from screened ligand drug discovery (FDSL-DD). FDSL-DD uses in silico screening to identify ligands from a vast library, fragmenting them while attaching specific attributes based on predicted binding affinity and interaction with the target subdomain. In this paper, we further propose a two-stage optimization method that utilizes the information from prescreening to optimize computational ligand synthesis. We hypothesize that using prescreening information for optimization shrinks the search space and focuses on promising regions, thereby improving the optimization for candidate ligands. The first optimization stage assembles these fragments into larger compounds using genetic algorithms, followed by a second stage of iterative refinement to produce compounds with enhanced bioactivity. To demonstrate broad applicability, the methodology is demonstrated on three diverse protein targets found in human solid cancers, bacterial antimicrobial resistance, and the SARS-CoV-2 virus. Combined, the proposed FDSL-DD and a two-stage optimization approach yield high-affinity ligand candidates more efficiently than other state-of-the-art computational FBDD methods. We further show that a multiobjective optimization method accounting for drug-likeness can still produce potential candidate ligands with a high binding affinity. Overall, the results demonstrate that integrating detailed chemical information with a constrained search framework can markedly optimize the initial drug discovery process, offering a more precise and efficient route to developing new therapeutics.


Assuntos
Descoberta de Drogas , Ligantes , Descoberta de Drogas/métodos , Humanos , SARS-CoV-2/metabolismo , Algoritmos , Tratamento Farmacológico da COVID-19 , COVID-19/virologia
3.
J Mol Graph Model ; 127: 108669, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38011826

RESUMO

Fragment-based drug design (FBDD) is one major drug discovery method employed in computer-aided drug discovery. Due to its inherent limitations, this process experiences long processing times and limited success rates. Here we present a new Fragment Databases from Screened Ligands Drug Design method (FDSL-DD) that intelligently incorporates information about fragment characteristics into a fragment-based design approach to the drug development process. The initial step of the FDSL-DD is the creation of a fragment database from a library of docked, drug-like ligands for a specific target, which deviates from the traditional in silico FBDD strategy, incorporating structure-based design screening techniques to combine the advantages of both approaches. Three different protein targets have been tested in this study to demonstrate the potential of the created fragment library and FDSL-DD. Utilizing the FDSL-DD led to an increase in binding affinity for each protein target. The most substantial increase was exhibited by the ligand designed for TIPE2, with a 3.6 kcalmol-1 difference between the top ligand from the FDSL-DD and top ligand from the high throughput virtual screening (HTVS). Using drug-like ligands in the initial HTVS allows for a greater search of chemical space, with higher efficiency in fragments selection, less grid boxes, and potentially identifying more interactions.


Assuntos
Desenho de Fármacos , Descoberta de Drogas , Ligantes , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala , Bases de Dados Factuais
4.
ISME J ; 17(10): 1751-1764, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37558860

RESUMO

While genome sequencing has expanded our knowledge of symbiosis, role assignment within multi-species microbiomes remains challenging due to genomic redundancy and the uncertainties of in vivo impacts. We address such questions, here, for a specialized nitrogen (N) recycling microbiome of turtle ants, describing a new genus and species of gut symbiont-Ischyrobacter davidsoniae (Betaproteobacteria: Burkholderiales: Alcaligenaceae)-and its in vivo physiological context. A re-analysis of amplicon sequencing data, with precisely assigned Ischyrobacter reads, revealed a seemingly ubiquitous distribution across the turtle ant genus Cephalotes, suggesting ≥50 million years since domestication. Through new genome sequencing, we also show that divergent I. davidsoniae lineages are conserved in their uricolytic and urea-generating capacities. With phylogenetically refined definitions of Ischyrobacter and separately domesticated Burkholderiales symbionts, our FISH microscopy revealed a distinct niche for I. davidsoniae, with dense populations at the anterior ileum. Being positioned at the site of host N-waste delivery, in vivo metatranscriptomics and metabolomics further implicate I. davidsoniae within a symbiont-autonomous N-recycling pathway. While encoding much of this pathway, I. davidsoniae expressed only a subset of the requisite steps in mature adult workers, including the penultimate step deriving urea from allantoate. The remaining steps were expressed by other specialized gut symbionts. Collectively, this assemblage converts inosine, made from midgut symbionts, into urea and ammonia in the hindgut. With urea supporting host amino acid budgets and cuticle synthesis, and with the ancient nature of other active N-recyclers discovered here, I. davidsoniae emerges as a central player in a conserved and impactful, multipartite symbiosis.


Assuntos
Formigas , Nitrogênio , Animais , Formigas/fisiologia , Filogenia , Simbiose/genética , Ureia
6.
BMC Genomics ; 24(1): 212, 2023 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-37095444

RESUMO

BACKGROUND: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. METHODS: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. RESULTS: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. CONCLUSIONS: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC.


Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Humanos , Predisposição Genética para Doença , Replicação do DNA , Mutação em Linhagem Germinativa , Células Germinativas
7.
PeerJ ; 11: e14779, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36785708

RESUMO

A major challenge for clustering algorithms is to balance the trade-off between homogeneity, i.e., the degree to which an individual cluster includes only related sequences, and completeness, the degree to which related sequences are broken up into multiple clusters. Most algorithms are conservative in grouping sequences with other sequences. Remote homologs may fail to be clustered together and instead form unnecessarily distinct clusters. The resulting clusters have high homogeneity but completeness that is too low. We propose Complet+, a computationally scalable post-processing method to increase the completeness of clusters without an undue cost in homogeneity. Complet+ proves to effectively merge closely-related clusters of protein that have verified structural relationships in the SCOPe classification scheme, improving the completeness of clustering results at little cost to homogeneity. Applying Complet+ to clusters obtained using MMseqs2's clusterupdate achieves an increased V-measure of 0.09 and 0.05 at the SCOPe superfamily and family levels, respectively. Complet+ also creates more biologically representative clusters, as shown by a substantial increase in Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI) metrics when comparing predicted clusters to biological classifications. Complet+ similarly improves clustering metrics when applied to other methods, such as CD-HIT and linclust. Finally, we show that Complet+ runtime scales linearly with respect to the number of clusters being post-processed on a COG dataset of over 3 million sequences. Code and supplementary information is available on Github: https://github.com/EESI/Complet-Plus.


Assuntos
Algoritmos , Proteínas , Alinhamento de Sequência , Sequência de Aminoácidos , Proteínas/química , Análise por Conglomerados
8.
Biology (Basel) ; 11(12)2022 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-36552295

RESUMO

Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture's interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron's reduced risk of severe disease, in accord with epidemiological and experimental data.

9.
Comput Biol Med ; 149: 105969, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36041271

RESUMO

Epidemiological studies show that COVID-19 variants-of-concern, like Delta and Omicron, pose different risks for severe disease, but they typically lack sequence-level information for the virus. Studies which do obtain viral genome sequences are generally limited in time, location, and population scope. Retrospective meta-analyses require time-consuming data extraction from heterogeneous formats and are limited to publicly available reports. Fortuitously, a subset of GISAID, the global SARS-CoV-2 sequence repository, includes "patient status" metadata that can indicate whether a sequence record is associated with mild or severe disease. While GISAID lacks data on comorbidities relevant to severity, such as obesity and chronic disease, it does include metadata for age and sex to use as additional attributes in modeling. With these caveats, previous efforts have demonstrated that genotype-patient status models can be fit to GISAID data, particularly when country-of-origin is used as an additional feature. But are these models robust and biologically meaningful? This paper shows that, in fact, temporal and geographic biases in sequences submitted to GISAID, as well as the evolving pandemic response, particularly reduction in severe disease due to vaccination, create complex issues for model development and interpretation. This paper poses a potential solution: efficient mixed effects machine learning using GPBoost, treating country as a random effect group. Training and validation using temporally split GISAID data and emerging Omicron variants demonstrates that GPBoost models are more predictive of the impact of spike protein mutations on patient outcomes than fixed effect XGBoost, LightGBM, random forests, and elastic net logistic regression models.


Assuntos
COVID-19 , Glicoproteína da Espícula de Coronavírus , COVID-19/epidemiologia , Humanos , Aprendizado de Máquina , Mutação , Filogenia , Estudos Retrospectivos , SARS-CoV-2 , Índice de Gravidade de Doença , Glicoproteína da Espícula de Coronavírus/genética
10.
mSystems ; 7(2): e0003522, 2022 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-35311562

RESUMO

Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces "black box" models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Pandemias/prevenção & controle , Sequenciamento de Nucleotídeos em Larga Escala
11.
Bioinformatics ; 38(8): 2344-2347, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35157026

RESUMO

MOTIVATION: The analysis of mutational signatures is becoming increasingly common in cancer genetics, with emerging implications in cancer evolution, classification, treatment decision and prognosis. Recently, several packages have been developed for mutational signature analysis, with each using different methodology and yielding significantly different results. Because of the non-trivial differences in tools' refitting results, researchers may desire to survey and compare the available tools, in order to objectively evaluate the results for their specific research question, such as which mutational signatures are prevalent in different cancer types. RESULTS: Due to the need for effective comparison of refitting mutational signatures, we introduce a user-friendly software that can aggregate and visually present results from different refitting packages. AVAILABILITY AND IMPLEMENTATION: MetaMutationalSigs is implemented using R and python and is available for installation using Docker and available at: https://github.com/EESI/MetaMutationalSigs.


Assuntos
Neoplasias , Software , Humanos , Mutação , Neoplasias/genética
12.
Integr Comp Biol ; 61(6): 2282-2293, 2022 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-34151345

RESUMO

Scientific culture and structure organize biological sciences in many ways. We make choices concerning the systems and questions we study. Our research then amplifies these choices into factors that influence the directions of future research by shaping our hypotheses, data analyses, interpretation, publication venues, and dissemination via other methods. But our choices are shaped by more than objective curiosity-we are influenced by cultural paradigms reinforced by societal upbringing and scientific indoctrination during training. This extends to the systems and data that we consider to be ethically obtainable or available for study, and who is considered qualified to do research, ask questions, and communicate about research. It is also influenced by the profitability of concepts like open-access-a system designed to improve equity, but which enacts gatekeeping in unintended but foreseeable ways. Creating truly integrative biology programs will require more than intentionally developing departments or institutes that allow overlapping expertise in two or more subfields of biology. Interdisciplinary work requires the expertise of large and diverse teams of scientists working together-this is impossible without an authentic commitment to addressing, not denying, racism when practiced by individuals, institutions, and cultural aspects of academic science. We have identified starting points for remedying how our field has discouraged and caused harm, but we acknowledge there is a long path forward. This path must be paved with field-wide solutions and institutional buy-in: our solutions must match the scale of the problem. Together, we can integrate-not reintegrate-the nuances of biology into our field.


Assuntos
Biologia , Animais
13.
PLoS Comput Biol ; 17(9): e1009345, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34550967

RESUMO

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).


Assuntos
Aprendizado Profundo , Microbiota/genética , Redes Neurais de Computação , RNA Ribossômico 16S/genética , Algoritmos , Biologia Computacional , Bases de Dados Genéticas , Microbioma Gastrointestinal/genética , Interações entre Hospedeiro e Microrganismos/genética , Humanos , Doenças Inflamatórias Intestinais/microbiologia , Processamento de Linguagem Natural , Fenótipo , Prevotella/classificação , Prevotella/genética , Prevotella/isolamento & purificação , Estudo de Prova de Conceito , RNA Ribossômico 16S/classificação
14.
Front Genet ; 12: 628758, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33868369

RESUMO

RRM2B plays a crucial role in DNA replication, repair and oxidative stress. While germline RRM2B mutations have been implicated in mitochondrial disorders, its relevance to cancer has not been established. Here, using TCGA studies, we investigated RRM2B alterations in cancer. We found that RRM2B is highly amplified in multiple tumor types, particularly in MYC-amplified tumors, and is associated with increased RRM2B mRNA expression. We also observed that the chromosomal region 8q22.3-8q24, is amplified in multiple tumors, and includes RRM2B, MYC along with several other cancer-associated genes. An analysis of genes within this 8q-amplicon showed that cancers that have both RRM2B-amplified along with MYC have a distinct pattern of amplification compared to cancers that are unaltered or those that have amplifications in RRM2B or MYC only. Investigation of curated biological interactions revealed that gene products of the amplified 8q22.3-8q24 region have important roles in DNA repair, DNA damage response, oxygen sensing, and apoptosis pathways and interact functionally. Notably, RRM2B-amplified cancers are characterized by mutation signatures of defective DNA repair and oxidative stress, and at least RRM2B-amplified breast cancers are associated with poor clinical outcome. These data suggest alterations in RR2MB and possibly the interacting 8q-proteins could have a profound effect on regulatory pathways such as DNA repair and cellular survival, highlighting therapeutic opportunities in these cancers.

15.
Front Microbiol ; 11: 528051, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33193120

RESUMO

In this article, we present our three-class course sequence to educate students about microbiome analysis and metagenomics through experiential learning by taking them from inquiry to analysis of the microbiome: Molecular Ecology Lab, Bioinformatics, and Computational Microbiome Analysis. Students developed hypotheses, designed lab experiments, sequenced the DNA from microbiomes, learned basic python/R scripting, became proficient in at least one microbiome analysis software, and were able to analyze data generated from the microbiome experiments. While over 150 students (graduate and undergraduate) were impacted by the development of the series of courses, our assessment was only on undergraduate learning, where 45 students enrolled in at least one of the three courses and 4 students took all three. Students gained skills in bioinformatics through the courses, and several positive comments were received through surveys and private correspondence. Through a summative assessment, general trends show that students became more proficient in comparative genomic techniques and had positive attitudes toward their abilities to bridge biology and bioinformatics. While most students took individual or 2 of the courses, we show that pre- and post-surveys of these individual classes still showed progress toward learning objectives. It is expected that students trained will enter the workforce with skills needed to innovate in the biotechnology, health, and environmental industries. Students are trained to maximize impact and tackle real world problems in biology and medicine with their learned knowledge of data science and machine learning. The course materials for the new microbiome analysis course are available on Github: https://github.com/EESI/Comp_Metagenomics_resources.

16.
J Circadian Rhythms ; 18: 6, 2020 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-33133210

RESUMO

BACKGROUND: Circadian misalignment can impair healthcare shift workers' physical and mental health, resulting in sleep deprivation, obesity, and chronic disease. This multidisciplinary research team assessed eating patterns and sleep/physical activity of healthcare workers on three different shifts (day, night, and rotating-shift). To date, no study of real-world shift workers' daily eating and sleep has utilized a largely-objective measurement. METHOD: During this fourteen-day observational study, participants wore two devices (Actiwatch and Bite Technologies counter) to measure physical activity, sleep, light exposure, and eating time. Participants also reported food intake via food diaries on personal mobile devices. RESULTS: In fourteen (5 day-, 5 night-, and 4 rotating-shift) participants, no baseline difference in BMI was observed. Overall, rotating-shift workers consumed fewer calories and had less activity and sleep than day- and night-shift workers. For eating patterns, compared to night- and rotating-shift, day-shift workers ate more frequently during work days. Night workers, however, consumed more calories at work relative to day and rotating workers. For physical activity and sleep, night-shift workers had the highest activity and least sleep on work days. CONCLUSION: This pilot study utilized primarily objective measurement to examine shift workers' habits outside the laboratory. Although no association between BMI and eating patterns/activity/sleep was observed across groups, a small, homogeneous sample may have influenced this. Overall, shift work was associated with 1) increased calorie intake and higher-fat and -carbohydrate diets and 2) sleep deprivation. A larger, more diverse sample can participate in future studies that objectively measure shift workers' real-world habits.

17.
Biology (Basel) ; 9(11)2020 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-33126516

RESUMO

Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately.

18.
PLoS Comput Biol ; 16(9): e1008269, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32941419

RESUMO

We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies. However, identifying viral subtypes in real-time is challenging: SARS-CoV-2 is a novel virus, and the pandemic is rapidly expanding. Viral subtypes may be difficult to detect due to rapid evolution; founder effects are more significant than selection pressure; and the clustering threshold for subtyping is not standardized. We propose to identify mutational signatures of available SARS-CoV-2 sequences using a population-based approach: an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Through ISM compression, we find that certain distant nucleotide variants covary, including non-coding and ORF1ab sites covarying with the D614G spike protein mutation which has become increasingly prevalent as the pandemic has spread. ISMs are also useful for downstream analyses, such as spatiotemporal visualization of viral dynamics. By analyzing sequence data available in the GISAID database, we validate the utility of ISM-based subtyping by comparing spatiotemporal analyses using ISMs to epidemiological studies of viral transmission in Asia, Europe, and the United States. In addition, we show the relationship of ISMs to phylogenetic reconstructions of SARS-CoV-2 evolution, and therefore, ISMs can play an important complementary role to phylogenetic tree-based analysis, such as is done in the Nextstrain project. The developed pipeline dynamically generates ISMs for newly added SARS-CoV-2 sequences and updates the visualization of pandemic spatiotemporal dynamics, and is available on Github at https://github.com/EESI/ISM (Jupyter notebook), https://github.com/EESI/ncov_ism (command line tool) and via an interactive website at https://covid19-ism.coe.drexel.edu/.


Assuntos
Betacoronavirus/classificação , Betacoronavirus/genética , Infecções por Coronavirus , Genômica/métodos , Pandemias , Pneumonia Viral , COVID-19 , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/transmissão , Infecções por Coronavirus/virologia , Evolução Molecular , Marcadores Genéticos/genética , Genoma Viral/genética , Humanos , Mutação/genética , Filogenia , Pneumonia Viral/epidemiologia , Pneumonia Viral/transmissão , Pneumonia Viral/virologia , RNA Viral/genética , SARS-CoV-2 , Alinhamento de Sequência , Análise de Sequência de RNA , Análise Espaço-Temporal
19.
BMC Bioinformatics ; 21(1): 412, 2020 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-32957925

RESUMO

BACKGROUND: It is a computational challenge for current metagenomic classifiers to keep up with the pace of training data generated from genome sequencing projects, such as the exponentially-growing NCBI RefSeq bacterial genome database. When new reference sequences are added to training data, statically trained classifiers must be rerun on all data, resulting in a highly inefficient process. The rich literature of "incremental learning" addresses the need to update an existing classifier to accommodate new data without sacrificing much accuracy compared to retraining the classifier with all data. RESULTS: We demonstrate how classification improves over time by incrementally training a classifier on progressive RefSeq snapshots and testing it on: (a) all known current genomes (as a ground truth set) and (b) a real experimental metagenomic gut sample. We demonstrate that as a classifier model's knowledge of genomes grows, classification accuracy increases. The proof-of-concept naïve Bayes implementation, when updated yearly, now runs in 1/4th of the non-incremental time with no accuracy loss. CONCLUSIONS: It is evident that classification improves by having the most current knowledge at its disposal. Therefore, it is of utmost importance to make classifiers computationally tractable to keep up with the data deluge. The incremental learning classifier can be efficiently updated without the cost of reprocessing nor the access to the existing database and therefore save storage as well as computation resources.


Assuntos
Microbioma Gastrointestinal/genética , Genoma Bacteriano , Aprendizado de Máquina , Metagenômica/métodos , Algoritmos , Bactérias/genética , Teorema de Bayes , Humanos , Metagenoma , Análise de Sequência de DNA/métodos
20.
Front Microbiol ; 11: 136, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32140140

RESUMO

Microbiome research has increased dramatically in recent years, driven by advances in technology and significant reductions in the cost of analysis. Such research has unlocked a wealth of data, which has yielded tremendous insight into the nature of the microbial communities, including their interactions and effects, both within a host and in an external environment as part of an ecological community. Understanding the role of microbiota, including their dynamic interactions with their hosts and other microbes, can enable the engineering of new diagnostic techniques and interventional strategies that can be used in a diverse spectrum of fields, spanning from ecology and agriculture to medicine and from forensics to exobiology. From June 19-23 in 2017, the NIH and NSF jointly held an Innovation Lab on Quantitative Approaches to Biomedical Data Science Challenges in our Understanding of the Microbiome. This review is inspired by some of the topics that arose as priority areas from this unique, interactive workshop. The goal of this review is to summarize the Innovation Lab's findings by introducing the reader to emerging challenges, exciting potential, and current directions in microbiome research. The review is broken into five key topic areas: (1) interactions between microbes and the human body, (2) evolution and ecology of microbes, including the role played by the environment and microbe-microbe interactions, (3) analytical and mathematical methods currently used in microbiome research, (4) leveraging knowledge of microbial composition and interactions to develop engineering solutions, and (5) interventional approaches and engineered microbiota that may be enabled by selectively altering microbial composition. As such, this review seeks to arm the reader with a broad understanding of the priorities and challenges in microbiome research today and provide inspiration for future investigation and multi-disciplinary collaboration.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...