Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
J Chem Inf Model ; 64(9): 3826-3840, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38696451

RESUMO

Recent advances in computational methods provide the promise of dramatically accelerating drug discovery. While mathematical modeling and machine learning have become vital in predicting drug-target interactions and properties, there is untapped potential in computational drug discovery due to the vast and complex chemical space. This paper builds on our recently published computational fragment-based drug discovery (FBDD) method called fragment databases from screened ligand drug discovery (FDSL-DD). FDSL-DD uses in silico screening to identify ligands from a vast library, fragmenting them while attaching specific attributes based on predicted binding affinity and interaction with the target subdomain. In this paper, we further propose a two-stage optimization method that utilizes the information from prescreening to optimize computational ligand synthesis. We hypothesize that using prescreening information for optimization shrinks the search space and focuses on promising regions, thereby improving the optimization for candidate ligands. The first optimization stage assembles these fragments into larger compounds using genetic algorithms, followed by a second stage of iterative refinement to produce compounds with enhanced bioactivity. To demonstrate broad applicability, the methodology is demonstrated on three diverse protein targets found in human solid cancers, bacterial antimicrobial resistance, and the SARS-CoV-2 virus. Combined, the proposed FDSL-DD and a two-stage optimization approach yield high-affinity ligand candidates more efficiently than other state-of-the-art computational FBDD methods. We further show that a multiobjective optimization method accounting for drug-likeness can still produce potential candidate ligands with a high binding affinity. Overall, the results demonstrate that integrating detailed chemical information with a constrained search framework can markedly optimize the initial drug discovery process, offering a more precise and efficient route to developing new therapeutics.


Assuntos
Descoberta de Drogas , Ligantes , Descoberta de Drogas/métodos , Humanos , SARS-CoV-2/metabolismo , Algoritmos , Tratamento Farmacológico da COVID-19 , COVID-19/virologia
2.
Artigo em Inglês | MEDLINE | ID: mdl-38822995

RESUMO

PURPOSE OF REVIEW: This review aims to explore the interface between artificial intelligence (AI) and chronic pain, seeking to identify areas of focus for enhancing current treatments and yielding novel therapies. RECENT FINDINGS: In the United States, the prevalence of chronic pain is estimated to be upwards of 40%. Its impact extends to increased healthcare costs, reduced economic productivity, and strain on healthcare resources. Addressing this condition is particularly challenging due to its complexity and the significant variability in how patients respond to treatment. Current options often struggle to provide long-term relief, with their benefits rarely outweighing the risks, such as dependency or other side effects. Currently, AI has impacted four key areas of chronic pain treatment and research: (1) predicting outcomes based on clinical information; (2) extracting features from text, specifically clinical notes; (3) modeling 'omic data to identify meaningful patient subgroups with potential for personalized treatments and improved understanding of disease processes; and (4) disentangling complex neuronal signals responsible for pain, which current therapies attempt to modulate. As AI advances, leveraging state-of-the-art architectures will be essential for improving chronic pain treatment. Current efforts aim to extract meaningful representations from complex data, paving the way for personalized medicine. The identification of unique patient subgroups should reveal targets for tailored chronic pain treatments. Moreover, enhancing current treatment approaches is achievable by gaining a more profound understanding of patient physiology and responses. This can be realized by leveraging AI on the increasing volume of data linked to chronic pain.

3.
BMC Genomics ; 24(1): 212, 2023 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-37095444

RESUMO

BACKGROUND: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. METHODS: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. RESULTS: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. CONCLUSIONS: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC.


Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Humanos , Predisposição Genética para Doença , Replicação do DNA , Mutação em Linhagem Germinativa , Células Germinativas
4.
Bioinformatics ; 38(8): 2344-2347, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35157026

RESUMO

MOTIVATION: The analysis of mutational signatures is becoming increasingly common in cancer genetics, with emerging implications in cancer evolution, classification, treatment decision and prognosis. Recently, several packages have been developed for mutational signature analysis, with each using different methodology and yielding significantly different results. Because of the non-trivial differences in tools' refitting results, researchers may desire to survey and compare the available tools, in order to objectively evaluate the results for their specific research question, such as which mutational signatures are prevalent in different cancer types. RESULTS: Due to the need for effective comparison of refitting mutational signatures, we introduce a user-friendly software that can aggregate and visually present results from different refitting packages. AVAILABILITY AND IMPLEMENTATION: MetaMutationalSigs is implemented using R and python and is available for installation using Docker and available at: https://github.com/EESI/MetaMutationalSigs.


Assuntos
Neoplasias , Software , Humanos , Mutação , Neoplasias/genética
5.
PLoS Comput Biol ; 17(9): e1009345, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34550967

RESUMO

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).


Assuntos
Aprendizado Profundo , Microbiota/genética , Redes Neurais de Computação , RNA Ribossômico 16S/genética , Algoritmos , Biologia Computacional , Bases de Dados Genéticas , Microbioma Gastrointestinal/genética , Interações entre Hospedeiro e Microrganismos/genética , Humanos , Doenças Inflamatórias Intestinais/microbiologia , Processamento de Linguagem Natural , Fenótipo , Prevotella/classificação , Prevotella/genética , Prevotella/isolamento & purificação , Estudo de Prova de Conceito , RNA Ribossômico 16S/classificação
6.
PLoS Comput Biol ; 16(9): e1008269, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32941419

RESUMO

We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies. However, identifying viral subtypes in real-time is challenging: SARS-CoV-2 is a novel virus, and the pandemic is rapidly expanding. Viral subtypes may be difficult to detect due to rapid evolution; founder effects are more significant than selection pressure; and the clustering threshold for subtyping is not standardized. We propose to identify mutational signatures of available SARS-CoV-2 sequences using a population-based approach: an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Through ISM compression, we find that certain distant nucleotide variants covary, including non-coding and ORF1ab sites covarying with the D614G spike protein mutation which has become increasingly prevalent as the pandemic has spread. ISMs are also useful for downstream analyses, such as spatiotemporal visualization of viral dynamics. By analyzing sequence data available in the GISAID database, we validate the utility of ISM-based subtyping by comparing spatiotemporal analyses using ISMs to epidemiological studies of viral transmission in Asia, Europe, and the United States. In addition, we show the relationship of ISMs to phylogenetic reconstructions of SARS-CoV-2 evolution, and therefore, ISMs can play an important complementary role to phylogenetic tree-based analysis, such as is done in the Nextstrain project. The developed pipeline dynamically generates ISMs for newly added SARS-CoV-2 sequences and updates the visualization of pandemic spatiotemporal dynamics, and is available on Github at https://github.com/EESI/ISM (Jupyter notebook), https://github.com/EESI/ncov_ism (command line tool) and via an interactive website at https://covid19-ism.coe.drexel.edu/.


Assuntos
Betacoronavirus/classificação , Betacoronavirus/genética , Infecções por Coronavirus , Genômica/métodos , Pandemias , Pneumonia Viral , COVID-19 , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/transmissão , Infecções por Coronavirus/virologia , Evolução Molecular , Marcadores Genéticos/genética , Genoma Viral/genética , Humanos , Mutação/genética , Filogenia , Pneumonia Viral/epidemiologia , Pneumonia Viral/transmissão , Pneumonia Viral/virologia , RNA Viral/genética , SARS-CoV-2 , Alinhamento de Sequência , Análise de Sequência de RNA , Análise Espaço-Temporal
7.
PLoS Comput Biol ; 15(2): e1006721, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30807567

RESUMO

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding ("embedding") each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.


Assuntos
RNA Ribossômico 16S/genética , RNA Ribossômico 16S/fisiologia , Análise de Sequência de RNA/métodos , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Microbiota/genética
8.
J Circadian Rhythms ; 18: 6, 2020 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-33133210

RESUMO

BACKGROUND: Circadian misalignment can impair healthcare shift workers' physical and mental health, resulting in sleep deprivation, obesity, and chronic disease. This multidisciplinary research team assessed eating patterns and sleep/physical activity of healthcare workers on three different shifts (day, night, and rotating-shift). To date, no study of real-world shift workers' daily eating and sleep has utilized a largely-objective measurement. METHOD: During this fourteen-day observational study, participants wore two devices (Actiwatch and Bite Technologies counter) to measure physical activity, sleep, light exposure, and eating time. Participants also reported food intake via food diaries on personal mobile devices. RESULTS: In fourteen (5 day-, 5 night-, and 4 rotating-shift) participants, no baseline difference in BMI was observed. Overall, rotating-shift workers consumed fewer calories and had less activity and sleep than day- and night-shift workers. For eating patterns, compared to night- and rotating-shift, day-shift workers ate more frequently during work days. Night workers, however, consumed more calories at work relative to day and rotating workers. For physical activity and sleep, night-shift workers had the highest activity and least sleep on work days. CONCLUSION: This pilot study utilized primarily objective measurement to examine shift workers' habits outside the laboratory. Although no association between BMI and eating patterns/activity/sleep was observed across groups, a small, homogeneous sample may have influenced this. Overall, shift work was associated with 1) increased calorie intake and higher-fat and -carbohydrate diets and 2) sleep deprivation. A larger, more diverse sample can participate in future studies that objectively measure shift workers' real-world habits.

10.
Nucleic Acids Res ; 42(Database issue): D625-32, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24198250

RESUMO

POGO-DB (http://pogo.ece.drexel.edu/) provides an easy platform for comparative microbial genomics. POGO-DB allows users to compare genomes using pre-computed metrics that were derived from extensive computationally intensive BLAST comparisons of >2000 microbes. These metrics include (i) average protein sequence identity across all orthologs shared by two genomes, (ii) genomic fluidity (a measure of gene content dissimilarity), (iii) number of 'orthologs' shared between two genomes, (iv) pairwise identity of the 16S ribosomal RNA genes and (v) pairwise identity of an additional 73 marker genes present in >90% prokaryotes. Users can visualize these metrics against each other in a 2D plot for exploratory analysis of genome similarity and of how different aspects of genome similarity relate to each other. The results of these comparisons are fully downloadable. In addition, users can download raw BLAST results for all or user-selected comparisons. Therefore, we provide users with full flexibility to carry out their own downstream analyses, by creating easy access to data that would normally require heavy computational resources to generate. POGO-DB should prove highly useful for researchers interested in comparative microbiology and benefit the microbiome/metagenomic communities by providing the information needed to select suitable phylogenetic marker genes within particular lineages.


Assuntos
Bases de Dados Genéticas , Genes Microbianos , Genoma Microbiano , Genômica , Internet , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência de Proteína
11.
BMC Bioinformatics ; 16: 358, 2015 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-26538306

RESUMO

BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & ß-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. RESULTS: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. CONCLUSIONS: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.


Assuntos
Biologia Computacional/métodos , Metagenômica/métodos , Software , Algoritmos , Bases de Dados Genéticas , Humanos , Microbiota/genética , Vegetarianos
12.
J Mol Graph Model ; 127: 108669, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38011826

RESUMO

Fragment-based drug design (FBDD) is one major drug discovery method employed in computer-aided drug discovery. Due to its inherent limitations, this process experiences long processing times and limited success rates. Here we present a new Fragment Databases from Screened Ligands Drug Design method (FDSL-DD) that intelligently incorporates information about fragment characteristics into a fragment-based design approach to the drug development process. The initial step of the FDSL-DD is the creation of a fragment database from a library of docked, drug-like ligands for a specific target, which deviates from the traditional in silico FBDD strategy, incorporating structure-based design screening techniques to combine the advantages of both approaches. Three different protein targets have been tested in this study to demonstrate the potential of the created fragment library and FDSL-DD. Utilizing the FDSL-DD led to an increase in binding affinity for each protein target. The most substantial increase was exhibited by the ligand designed for TIPE2, with a 3.6 kcalmol-1 difference between the top ligand from the FDSL-DD and top ligand from the high throughput virtual screening (HTVS). Using drug-like ligands in the initial HTVS allows for a greater search of chemical space, with higher efficiency in fragments selection, less grid boxes, and potentially identifying more interactions.


Assuntos
Desenho de Fármacos , Descoberta de Drogas , Ligantes , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala , Bases de Dados Factuais
13.
PeerJ ; 11: e14779, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36785708

RESUMO

A major challenge for clustering algorithms is to balance the trade-off between homogeneity, i.e., the degree to which an individual cluster includes only related sequences, and completeness, the degree to which related sequences are broken up into multiple clusters. Most algorithms are conservative in grouping sequences with other sequences. Remote homologs may fail to be clustered together and instead form unnecessarily distinct clusters. The resulting clusters have high homogeneity but completeness that is too low. We propose Complet+, a computationally scalable post-processing method to increase the completeness of clusters without an undue cost in homogeneity. Complet+ proves to effectively merge closely-related clusters of protein that have verified structural relationships in the SCOPe classification scheme, improving the completeness of clustering results at little cost to homogeneity. Applying Complet+ to clusters obtained using MMseqs2's clusterupdate achieves an increased V-measure of 0.09 and 0.05 at the SCOPe superfamily and family levels, respectively. Complet+ also creates more biologically representative clusters, as shown by a substantial increase in Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI) metrics when comparing predicted clusters to biological classifications. Complet+ similarly improves clustering metrics when applied to other methods, such as CD-HIT and linclust. Finally, we show that Complet+ runtime scales linearly with respect to the number of clusters being post-processed on a COG dataset of over 3 million sequences. Code and supplementary information is available on Github: https://github.com/EESI/Complet-Plus.


Assuntos
Algoritmos , Proteínas , Alinhamento de Sequência , Sequência de Aminoácidos , Proteínas/química , Análise por Conglomerados
14.
ISME J ; 17(10): 1751-1764, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37558860

RESUMO

While genome sequencing has expanded our knowledge of symbiosis, role assignment within multi-species microbiomes remains challenging due to genomic redundancy and the uncertainties of in vivo impacts. We address such questions, here, for a specialized nitrogen (N) recycling microbiome of turtle ants, describing a new genus and species of gut symbiont-Ischyrobacter davidsoniae (Betaproteobacteria: Burkholderiales: Alcaligenaceae)-and its in vivo physiological context. A re-analysis of amplicon sequencing data, with precisely assigned Ischyrobacter reads, revealed a seemingly ubiquitous distribution across the turtle ant genus Cephalotes, suggesting ≥50 million years since domestication. Through new genome sequencing, we also show that divergent I. davidsoniae lineages are conserved in their uricolytic and urea-generating capacities. With phylogenetically refined definitions of Ischyrobacter and separately domesticated Burkholderiales symbionts, our FISH microscopy revealed a distinct niche for I. davidsoniae, with dense populations at the anterior ileum. Being positioned at the site of host N-waste delivery, in vivo metatranscriptomics and metabolomics further implicate I. davidsoniae within a symbiont-autonomous N-recycling pathway. While encoding much of this pathway, I. davidsoniae expressed only a subset of the requisite steps in mature adult workers, including the penultimate step deriving urea from allantoate. The remaining steps were expressed by other specialized gut symbionts. Collectively, this assemblage converts inosine, made from midgut symbionts, into urea and ammonia in the hindgut. With urea supporting host amino acid budgets and cuticle synthesis, and with the ancient nature of other active N-recyclers discovered here, I. davidsoniae emerges as a central player in a conserved and impactful, multipartite symbiosis.


Assuntos
Formigas , Nitrogênio , Animais , Formigas/fisiologia , Filogenia , Simbiose/genética , Ureia
15.
Bioinformatics ; 27(1): 127-9, 2011 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-21062764

RESUMO

MOTIVATION: Datasets from high-throughput sequencing technologies have yielded a vast amount of data about organisms in environmental samples. Yet, it is still a challenge to assess the exact organism content in these samples because the task of taxonomic classification is too computationally complex to annotate all reads in a dataset. An easy-to-use webserver is needed to process these reads. While many methods exist, only a few are publicly available on webservers, and out of those, most do not annotate all reads. RESULTS: We introduce a webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss. AVAILABILITY: Publicly available at: http://nbc.ece.drexel.edu.


Assuntos
Metagenômica/métodos , Filogenia , Software , Algoritmos , Teorema de Bayes , Sequenciamento de Nucleotídeos em Larga Escala , Internet
16.
Mol Ecol ; 21(13): 3363-78, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22486918

RESUMO

Symbiotic bacteria often help their hosts acquire nutrients from their diet, showing trends of co-evolution and independent acquisition by hosts from the same trophic levels. While these trends hint at important roles for biotic factors, the effects of the abiotic environment on symbiotic community composition remain comparably understudied. In this investigation, we examined the influence of abiotic and biotic factors on the gut bacterial communities of fish from different taxa, trophic levels and habitats. Phylogenetic and statistical analyses of 25 16S rRNA libraries revealed that salinity, trophic level and possibly host phylogeny shape the composition of fish gut bacteria. When analysed alongside bacterial communities from other environments, fish gut communities typically clustered with gut communities from mammals and insects. Similar consideration of individual phylotypes (vs. communities) revealed evolutionary ties between fish gut microbes and symbionts of animals, as many of the bacteria from the guts of herbivorous fish were closely related to those from mammals. Our results indicate that fish harbour more specialized gut communities than previously recognized. They also highlight a trend of convergent acquisition of similar bacterial communities by fish and mammals, raising the possibility that fish were the first to evolve symbioses resembling those found among extant gut fermenting mammals.


Assuntos
Bactérias/genética , Peixes/microbiologia , Trato Gastrointestinal/microbiologia , Metagenoma , Animais , Bactérias/classificação , Dados de Sequência Molecular , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Simbiose
17.
mSystems ; 7(2): e0003522, 2022 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-35311562

RESUMO

Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces "black box" models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Pandemias/prevenção & controle , Sequenciamento de Nucleotídeos em Larga Escala
18.
Comput Biol Med ; 149: 105969, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36041271

RESUMO

Epidemiological studies show that COVID-19 variants-of-concern, like Delta and Omicron, pose different risks for severe disease, but they typically lack sequence-level information for the virus. Studies which do obtain viral genome sequences are generally limited in time, location, and population scope. Retrospective meta-analyses require time-consuming data extraction from heterogeneous formats and are limited to publicly available reports. Fortuitously, a subset of GISAID, the global SARS-CoV-2 sequence repository, includes "patient status" metadata that can indicate whether a sequence record is associated with mild or severe disease. While GISAID lacks data on comorbidities relevant to severity, such as obesity and chronic disease, it does include metadata for age and sex to use as additional attributes in modeling. With these caveats, previous efforts have demonstrated that genotype-patient status models can be fit to GISAID data, particularly when country-of-origin is used as an additional feature. But are these models robust and biologically meaningful? This paper shows that, in fact, temporal and geographic biases in sequences submitted to GISAID, as well as the evolving pandemic response, particularly reduction in severe disease due to vaccination, create complex issues for model development and interpretation. This paper poses a potential solution: efficient mixed effects machine learning using GPBoost, treating country as a random effect group. Training and validation using temporally split GISAID data and emerging Omicron variants demonstrates that GPBoost models are more predictive of the impact of spike protein mutations on patient outcomes than fixed effect XGBoost, LightGBM, random forests, and elastic net logistic regression models.


Assuntos
COVID-19 , Glicoproteína da Espícula de Coronavírus , COVID-19/epidemiologia , Humanos , Aprendizado de Máquina , Mutação , Filogenia , Estudos Retrospectivos , SARS-CoV-2 , Índice de Gravidade de Doença , Glicoproteína da Espícula de Coronavírus/genética
19.
Biology (Basel) ; 11(12)2022 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-36552295

RESUMO

Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture's interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron's reduced risk of severe disease, in accord with epidemiological and experimental data.

20.
Integr Comp Biol ; 61(6): 2282-2293, 2022 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-34151345

RESUMO

Scientific culture and structure organize biological sciences in many ways. We make choices concerning the systems and questions we study. Our research then amplifies these choices into factors that influence the directions of future research by shaping our hypotheses, data analyses, interpretation, publication venues, and dissemination via other methods. But our choices are shaped by more than objective curiosity-we are influenced by cultural paradigms reinforced by societal upbringing and scientific indoctrination during training. This extends to the systems and data that we consider to be ethically obtainable or available for study, and who is considered qualified to do research, ask questions, and communicate about research. It is also influenced by the profitability of concepts like open-access-a system designed to improve equity, but which enacts gatekeeping in unintended but foreseeable ways. Creating truly integrative biology programs will require more than intentionally developing departments or institutes that allow overlapping expertise in two or more subfields of biology. Interdisciplinary work requires the expertise of large and diverse teams of scientists working together-this is impossible without an authentic commitment to addressing, not denying, racism when practiced by individuals, institutions, and cultural aspects of academic science. We have identified starting points for remedying how our field has discouraged and caused harm, but we acknowledge there is a long path forward. This path must be paved with field-wide solutions and institutional buy-in: our solutions must match the scale of the problem. Together, we can integrate-not reintegrate-the nuances of biology into our field.


Assuntos
Biologia , Animais
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa