RESUMO
Drug-drug interactions (DDI) are a critical concern in healthcare due to their potential to cause adverse effects and compromise patient safety. Supervised machine learning models for DDI prediction need to be optimized to learn abstract, transferable features, and generalize to larger chemical spaces, primarily due to the scarcity of high-quality labeled DDI data. Inspired by recent advances in computer vision, we present SMR-DDI, a self-supervised framework that leverages contrastive learning to embed drugs into a scaffold-based feature space. Molecular scaffolds represent the core structural motifs that drive pharmacological activities, making them valuable for learning informative representations. Specifically, we pre-trained SMR-DDI on a large-scale unlabeled molecular dataset. We generated augmented views for each molecule via SMILES enumeration and optimized the embedding process through contrastive loss minimization between views. This enables the model to capture relevant and robust molecular features while reducing noise. We then transfer the learned representations for the downstream prediction of DDI. Experiments show that the new feature space has comparable expressivity to state-of-the-art molecular representations and achieved competitive DDI prediction results while training on less data. Additional investigations also revealed that pre-training on more extensive and diverse unlabeled molecular datasets improved the model's capability to embed molecules more effectively. Our results highlight contrastive learning as a promising approach for DDI prediction that can identify potentially hazardous drug combinations using only structural information.
Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Aprendizado de Máquina SupervisionadoRESUMO
BACKGROUND: There is a need to understand the duration of infectivity of primary and recurrent coronavirus disease 2019 (COVID-19) and identify predictors of loss of infectivity. METHODS: Prospective observational cohort study with serial viral culture, rapid antigen detection test (RADT) and reverse transcription polymerase chain reaction (RT-PCR) on nasopharyngeal specimens of healthcare workers with COVID-19. The primary outcome was viral culture positivity as indicative of infectivity. Predictors of loss of infectivity were determined using multivariate regression model. The performance of the US Centers for Disease Control and Prevention (CDC) criteria (fever resolution, symptom improvement, and negative RADT) to predict loss of infectivity was also investigated. RESULTS: In total, 121 participants (91 female [79.3%]; average age, 40 years) were enrolled. Most (n = 107, 88.4%) had received ≥3 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccine doses, and 20 (16.5%) had COVID-19 previously. Viral culture positivity decreased from 71.9% (87/121) on day 5 of infection to 18.2% (22/121) on day 10. Participants with recurrent COVID-19 had a lower likelihood of infectivity than those with primary COVID-19 at each follow-up (day 5 odds ratio [OR], 0.14; P < .001]; day 7 OR, 0.04; P = .003]) and were all non-infective by day 10 (P = .02). Independent predictors of infectivity included prior COVID-19 (adjusted OR [aOR] on day 5, 0.005; P = .003), an RT-PCR cycle threshold [Ct] value <23 (aOR on day 5, 22.75; P < .001) but not symptom improvement or RADT result.The CDC criteria would identify 36% (24/67) of all non-infectious individuals on day 7. However, 17% (5/29) of those meeting all the criteria had a positive viral culture. CONCLUSIONS: Infectivity of recurrent COVID-19 is shorter than primary infections. Loss of infectivity algorithms could be optimized.
Assuntos
COVID-19 , Adulto , Feminino , Humanos , COVID-19/diagnóstico , Teste para COVID-19 , Pessoal de Saúde , Estudos Prospectivos , SARS-CoV-2 , MasculinoRESUMO
Thousands of new phages have recently been discovered thanks to viral metagenomics. These phages are extremely diverse and their genome sequences often do not resemble any known phages. To appreciate their ecological impact, it is important to determine their bacterial hosts. CRISPR spacers can be used to predict hosts of unknown phages, as spacers represent biological records of past phage-bacteria interactions. However, no guidelines have been established to standardize host prediction based on CRISPR spacers. Additionally, there are no tools that use spacers to perform host predictions on large viral datasets. Here, we developed a set of tools that includes all the necessary steps for predicting the hosts of uncharacterized phages. We created a database of >11 million spacers and a program to execute host predictions on large viral datasets. Our host prediction approach uses biological criteria inspired by how CRISPR-Cas naturally work as adaptive immune systems, which make the results easy to interpret. We evaluated the performance using 9484 phages with known hosts and obtained a recall of 49% and a precision of 69%. We also found that this host prediction method yielded higher performance for phages that infect gut-associated bacteria, suggesting it is well suited for gut-virome characterization.
Assuntos
Bacteriófagos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Bases de Dados de Ácidos Nucleicos , Genoma Bacteriano , Metagenômica/métodos , Trato Gastrointestinal/microbiologia , Internet , SoftwareRESUMO
BACKGROUND: Microglia participate in the immune response upon central nervous system (CNS) infections. However, the role of these cells during herpes simplex encephalitis (HSE) has not been fully characterized. We sought to identify different microglia/microglia-like cells and describe the potential mechanisms and signaling pathways involved during HSE. METHODS: The transcriptional response of CD11b+ immune cells, including microglia/microglia-like cells, was investigated using single-cell RNA sequencing (scRNA-seq) on cells isolated from the ventral posterolateral nucleus (VPL)-enriched thalamic regions of C57BL/6 N mice intranasally infected with herpes simplex virus-1 (HSV-1) (6 × 105 PFUs/20 µl). We further performed scanning electronic microscopy (SEM) analysis in VPL regions on day 6 post-infection (p.i.) to provide insight into microglial functions. RESULTS: We describe a novel microglia-like transcriptional response associated with a rare cell population (7% of all analyzed cells), named "in transition" microglia/microglia-like cells in HSE. This new microglia-like transcriptional signature, found in the highly infected thalamic regions, was enriched in specific genes (Retnlg, Cxcr2, Il1f9) usually associated with neutrophils. Pathway analysis of this cell-type transcriptome showed increased NLRP3-inflammasome-mediated interleukin IL-1ß production, promoting a pro-inflammatory response. These cells' increased expression of viral transcripts suggests that the distinct "in transition" transcriptome corresponds to the intrinsic antiviral immune signaling of HSV-1-infected microglia/microglia-like cells in the thalamus. In accordance with this phenotype, we observed several TMEM119+/IBA-I+ microglia/microglia-like cells immunostained for HSV-1 in highly infected regions. CONCLUSIONS: A new microglia/microglia-like state may potentially shed light on how microglia could react to HSV-1 infection. Our observations suggest that infected microglia/microglia-like cells contribute to an exacerbated CNS inflammation. Further characterization of this transitory state of the microglia/microglia-like cell transcriptome may allow the development of novel immunomodulatory approaches to improve HSE outcomes by regulating the microglial immune response.
Assuntos
Encefalite por Herpes Simples , Herpesvirus Humano 1 , Animais , Camundongos , Camundongos Endogâmicos C57BL , Microglia/metabolismo , Transcriptoma , Núcleos Ventrais do TálamoRESUMO
Circulating levels of the amino acid glutamate are associated with central fat accumulation, yet the pathophysiology of this relationship remains unknown. We aimed to (i) refine and validate the association between circulating glutamate and abdominal obesity in a large twin cohort, and (ii) investigate whether transcriptomic profiles in adipose tissue could provide insight into the biological mechanisms underlying the association. First, in a cohort of 4665 individuals from the TwinsUK resource, we identified individuals with abdominal obesity and compared prevalence of the latter across circulating glutamate quintiles. Second, we used transcriptomic signatures generated from adipose tissue, both subcutaneous and visceral, to investigate associations with circulating glutamate levels. Individuals in the top circulating glutamate quintile had a sevenfold higher prevalence of abdominal obesity compared to those in the bottom quintile. The adipose tissue transcriptomic analyses identified GLUL, encoding Glutamate-Ammonia Ligase, as being associated with circulating glutamate and abdominal obesity, with pronounced signatures in the visceral depot. In conclusion, circulating glutamate is positively associated with the prevalence of abdominal obesity which relates to dysregulated GLUL expression specifically in visceral adipose tissue.
Assuntos
Ácido Glutâmico , Obesidade Abdominal , Tecido Adiposo/metabolismo , Índice de Massa Corporal , Expressão Gênica , Humanos , Obesidade/metabolismo , Obesidade Abdominal/genéticaRESUMO
BACKGROUND: Deep learning methods are a proven commodity in many fields and endeavors. One of these endeavors is predicting the presence of adverse drug-drug interactions (DDIs). The models generated can predict, with reasonable accuracy, the phenotypes arising from the drug interactions using their molecular structures. Nevertheless, this task requires improvement to be truly useful. Given the complexity of the predictive task, an extensive benchmarking on structure-based models for DDIs prediction was performed to evaluate their drawbacks and advantages. RESULTS: We rigorously tested various structure-based models that predict drug interactions using different splitting strategies to simulate different real-world scenarios. In addition to the effects of different training and testing setups on the robustness and generalizability of the models, we then explore the contribution of traditional approaches such as multitask learning and data augmentation. CONCLUSION: Structure-based models tend to generalize poorly to unseen drugs despite their ability to identify new DDIs among drugs seen during training accurately. Indeed, they efficiently propagate information between known drugs and could be valuable for discovering new DDIs in a database. However, these models will most probably fail when exposed to unknown drugs. While multitask learning does not help in our case to solve the problem, the use of data augmentation does at least mitigate it. Therefore, researchers must be cautious of the bias of the random evaluation scheme, especially if their goal is to discover new DDIs.
Assuntos
Preparações Farmacêuticas , Bases de Dados Factuais , Interações MedicamentosasRESUMO
Nostoc (Nostocales, Cyanobacteria) has a global distribution in the Polar Regions. However, the genomic diversity of Nostoc is little known and there are no genomes available for polar Nostoc. Here we carried out the first genomic analysis of the Nostoc commune morphotype with a recent sample from the High Arctic and a herbarium specimen collected during the British Arctic Expedition (1875-76). Comparisons of the polar genomes with 26 present-day non-polar members of the Nostocales family highlighted that there are pronounced genetic variations among Nostoc strains and species. Osmoprotection and other stress genes were found in all Nostoc strains, but the two Arctic strains had markedly higher numbers of biosynthetic gene clusters for uncharacterised non-ribosomal peptide synthetases, suggesting a high diversity of secondary metabolites. Since viral-host interactions contribute to microbial diversity, we analysed the CRISPR-Cas systems in the Arctic and two temperate Nostoc species. There were a large number of unique repeat-spacer arrays in each genome, indicating diverse histories of viral attack. All Nostoc strains had a subtype I-D system, but the polar specimens also showed evidence of a subtype I-B system that has not been previously reported in cyanobacteria, suggesting diverse cyanobacteria-virus interactions in the Arctic.
Assuntos
Sistemas CRISPR-Cas , Nostoc , Genômica , Família Multigênica , Nostoc/genética , FilogeniaRESUMO
BACKGROUND: Polypharmacy is common among older adults and it represents a public health concern, due to the negative health impacts potentially associated with the use of several medications. However, the large number of medication combinations and sequences of use makes it complicated for traditional statistical methods to predict which therapy is genuinely associated with health outcomes. The project aims to use artificial intelligence (AI) to determine the quality of polypharmacy among older adults with chronic diseases in the province of Québec, Canada. METHODS: We will use data from the Quebec Integrated Chronic Disease Surveillance System (QICDSS). QICDSS contains information about prescribed medications in older adults in Quebec collected over 20 years. It also includes diagnostic codes and procedures, and sociodemographic data linked through a unique identification number for each individual. Our research will be structured around three interconnected research axes: AI, Health, and Law&Ethics. The AI research axis will develop algorithms for finding frequent patterns of medication use that correlate with health events, considering data locality and temporality (explainable AI or XAI). The Health research axis will translate these patterns into polypharmacy indicators relevant to public health surveillance and clinicians. The Law&Ethics axis will assess the social acceptability of the algorithms developed using AI tools and the indicators developed by the Heath axis and will ensure that the developed indicators neither discriminate against any population group nor increase the disparities already present in the use of medications. DISCUSSION: The multi-disciplinary research team consists of specialists in AI, health data, statistics, pharmacy, public health, law, and ethics, which will allow investigation of polypharmacy from different points of view and will contribute to a deeper understanding of the clinical, social, and ethical issues surrounding polypharmacy and its surveillance, as well as the use of AI for health record data. The project results will be disseminated to the scientific community, healthcare professionals, and public health decision-makers in peer-reviewed publications, scientific meetings, and reports. The diffusion of the results will ensure the confidentiality of individual data.
Assuntos
Inteligência Artificial , Polimedicação , Idoso , Doença Crônica , Análise de Dados , Humanos , QuebequeRESUMO
BACKGROUND: In 2017, the Democratic Republic of the Congo (DRC) recorded its eighth Ebola virus disease (EVD) outbreak, approximately 3 years after the previous outbreak. METHODS: Suspect cases of EVD were identified on the basis of clinical and epidemiological information. Reverse transcription-polymerase chain reaction (RT-PCR) analysis or serological testing was used to confirm Ebola virus infection in suspected cases. The causative virus was later sequenced from a RT-PCR-positive individual and assessed using phylogenetic analysis. RESULTS: Three probable and 5 laboratory-confirmed cases of EVD were recorded between 27 March and 1 July 2017 in the DRC. Fifty percent of cases died from the infection. EVD cases were detected in 4 separate areas, resulting in > 270 contacts monitored. The complete genome of the causative agent, a variant from the Zaireebolavirus species, denoted Ebola virus Muyembe, was obtained using next-generation sequencing. This variant is genetically closest, with 98.73% homology, to the Ebola virus Mayinga variant isolated from the first DRC outbreaks in 1976-1977. CONCLUSION: A single spillover event into the human population is responsible for this DRC outbreak. Human-to-human transmission resulted in limited dissemination of the causative agent, a novel Ebola virus variant closely related to the initial Mayinga variant isolated in 1976-1977 in the DRC.
Assuntos
Surtos de Doenças , Ebolavirus/genética , Doença pelo Vírus Ebola/diagnóstico , Doença pelo Vírus Ebola/epidemiologia , Adolescente , Adulto , República Democrática do Congo/epidemiologia , Ebolavirus/imunologia , Feminino , Doença pelo Vírus Ebola/transmissão , Doença pelo Vírus Ebola/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Filogenia , RNA Viral/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Testes Sorológicos , Adulto JovemRESUMO
Background: Whole genome sequencing (WGS) studies can enhance our understanding of the role of patients with asymptomatic Clostridium difficile colonization in transmission. Methods: Isolates obtained from patients with Clostridium difficile infection (CDI) and colonization identified in a study conducted during 2006-2007 at 6 Canadian hospitals underwent typing by pulsed-field gel electrophoresis, multilocus sequence typing, and WGS. Isolates from incident CDI cases not in the initial study were also sequenced where possible. Ward movement and typing data were combined to identify plausible donors for each CDI case, as defined by shared time and space within predefined limits. Proportions of plausible donors for CDI cases that were colonized, infected, or both were examined. Results: Five hundred fifty-four isolates were sequenced successfully, 353 from colonized patients and 201 from CDI cases. The NAP1/027/ST1 strain was the most common strain, found in 124 (62%) of infected and 92 (26%) of colonized patients. A donor with a plausible ward link was found for 81 CDI cases (40%) using WGS with a threshold of ≤2 single nucleotide polymorphisms to determine relatedness. Sixty-five (32%) CDI cases could be linked to both infected and colonized donors. Exclusive linkages to infected and colonized donors were found for 28 (14%) and 12 (6%) CDI cases, respectively. Conclusions: Colonized patients contribute to transmission, but CDI cases are more likely linked to other infected patients than colonized patients in this cohort with high rates of the NAP1/027/ST1 strain, highlighting the importance of local prevalence of virulent strains in determining transmission dynamics.
Assuntos
Clostridioides difficile/genética , Infecções por Clostridium/microbiologia , Infecções por Clostridium/transmissão , Sequenciamento Completo do Genoma , Portador Sadio , Infecção Hospitalar/microbiologia , Infecção Hospitalar/transmissão , DNA Bacteriano/genética , Genoma Bacteriano , HumanosRESUMO
Untargeted metabolomic measurements using mass spectrometry are a powerful tool for uncovering new small molecules with environmental and biological importance. The small molecule identification step, however, still remains an enormous challenge due to fragmentation difficulties or unspecific fragment ion information. Current methods to address this challenge are often dependent on databases or require the use of nuclear magnetic resonance (NMR), which have their own difficulties. The use of the gas-phase collision cross section (CCS) values obtained from ion mobility spectrometry (IMS) measurements were recently demonstrated to reduce the number of false positive metabolite identifications. While promising, the amount of empirical CCS information currently available is limited, thus predictive CCS methods need to be developed. In this article, we expand upon current experimental IMS capabilities by predicting the CCS values using a deep learning algorithm. We successfully developed and trained a prediction model for CCS values requiring only information about a compound's SMILES notation and ion type. The use of data from five different laboratories using different instruments allowed the algorithm to be trained and tested on more than 2400 molecules. The resulting CCS predictions were found to achieve a coefficient of determination of 0.97 and median relative error of 2.7% for a wide range of molecules. Furthermore, the method requires only a small amount of processing power to predict CCS values. Considering the performance, time, and resources necessary, as well as its applicability to a variety of molecules, this model was able to outperform all currently available CCS prediction algorithms.
Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Espectrometria de Mobilidade Iônica , Espectroscopia de Ressonância Magnética , Espectrometria de Massas , MetabolômicaRESUMO
A rod-shaped, motile anaerobic bacterium, designated CCRI-22567T, was isolated from a vaginal sample of a woman diagnosed with bacterial vaginosis and subjected to a polyphasic taxonomic study. The novel strain was capable of growth at 30-42 °C (optimum, 42 °C), at pH 5.5-8.5 (optimum, pH 7.0-7.5) and in the presence of 0-1.5â% (w/v) NaCl (optimally at 0.5â% NaCl). The phylogenetic trees based on 16S rRNA gene sequences showed that strain CCRI-22567T forms a distinct evolutionary lineage independent of other taxa in the family Peptostreptococcaceae. Strain CCRI-22567T exhibited 90.1â% 16S rRNA gene sequence similarity to Peptoanaerobacter stomatis ACC19aT and 89.7â% to Eubacterium yurii subsp. schtitka ATCC 43716. The three closest organisms with an available whole genome were compared to strain CCRI-22567T for genomic relatedness assessment. The genomic average nucleotide identities (OrthoANIu) obtained with Peptoanaerobacter stomatis ACC19aT, Eubacterium yurii subsp. margaretiae ATCC 43715 and Filifactor alocis ATCC 35896T were 71.8, 70.3 and 69.6â%, respectively. Strain CCRI-22567T contained C18â:â1 ω9c and C18â:â1 ω9c DMA as the major fatty acids. The DNA G+C content of strain CCRI-22567T based on its genome sequence was 33.8âmol%. On the basis of the phylogenetic, chemotaxonomic and other phenotypic properties, strain CCRI-22567T is considered to represent a new genus and species within the family Peptostreptococcaceae, for which the name Criibacterium bergeronii gen. nov., sp. nov., is proposed. The type strain of Criibacterium bergeronii is CCRI-22567T (=LMG 31278T=DSM 107614T=CCUG 72594T).
RESUMO
Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.
Assuntos
Biologia Computacional/métodos , Genoma Bacteriano/genética , Análise de Sequência de DNA/métodos , Bactérias/genética , Evolução Biológica , Análise por Conglomerados , Simulação por Computador , Evolução Molecular , Genômica/métodos , Metagenômica , Filogenia , Células Procarióticas , SoftwareRESUMO
Wastewater treatment center (WTC) workers may be vulnerable to diseases caused by viruses, such as the common cold, influenza and gastro-intestinal infections. Although there is a substantial body of literature characterizing the microbial community found in wastewater, only a few studies have characterized the viral component of WTC aerosols, despite the fact that most diseases affecting WTC workers are of viral origin and that some of these viruses are transmitted through the air. In this study, we evaluated in four WTCs the presence of 11 viral pathogens of particular concern in this milieu and used a metagenomic approach to characterize the total viral community in the air of one of those WTCs. The presence of viruses in aerosols in different locations of individual WTCs was evaluated and the results obtained with four commonly used air samplers were compared. We detected four of the eleven viruses tested, including human adenovirus (hAdV), rotavirus, hepatitis A virus (HAV) and Herpes Simplex virus type 1 (HSV1). The results of the metagenomic assay uncovered very few viral RNA sequences in WTC aerosols, however sequences from human DNA viruses were in much greater relative abundance.
Assuntos
Aerossóis/análise , Microbiologia do Ar , Poluentes Atmosféricos/análise , Monitoramento Ambiental , Vírus , Eliminação de Resíduos Líquidos , Águas Residuárias/virologia , HumanosRESUMO
Gene amplification of specific loci has been described in all kingdoms of life. In the protozoan parasite Leishmania, the product of amplification is usually part of extrachromosomal circular or linear amplicons that are formed at the level of direct or inverted repeated sequences. A bioinformatics screen revealed that repeated sequences are widely distributed in the Leishmania genome and the repeats are chromosome-specific, conserved among species, and generally present in low copy number. Using sensitive PCR assays, we provide evidence that the Leishmania genome is continuously being rearranged at the level of these repeated sequences, which serve as a functional platform for constitutive and stochastic amplification (and deletion) of genomic segments in the population. This process is adaptive as the copy number of advantageous extrachromosomal circular or linear elements increases upon selective pressure and is reversible when selection is removed. We also provide mechanistic insights on the formation of circular and linear amplicons through RAD51 recombinase-dependent and -independent mechanisms, respectively. The whole genome of Leishmania is thus stochastically rearranged at the level of repeated sequences, and the selection of parasite subpopulations with changes in the copy number of specific loci is used as a strategy to respond to a changing environment.
Assuntos
Amplificação de Genes , Genoma de Protozoário , Sequências Repetidas Invertidas , Leishmania braziliensis/genética , Leishmania infantum/genética , Leishmania major/genética , Sequências Repetitivas de Ácido Nucleico , Adaptação Fisiológica/genética , Biologia Computacional , Variações do Número de Cópias de DNA , Leishmania braziliensis/metabolismo , Leishmania infantum/metabolismo , Leishmania major/metabolismo , Rad51 Recombinase/genética , Rad51 Recombinase/metabolismo , Especificidade da Espécie , Processos EstocásticosRESUMO
BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. RESULTS: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery. CONCLUSIONS: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes ( http://github.com/aldro61/kover/ ).
RESUMO
The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/.
Assuntos
Peptídeos Catiônicos Antimicrobianos/química , Peptídeos Catiônicos Antimicrobianos/farmacocinética , Fenômenos Fisiológicos Bacterianos/efeitos dos fármacos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Peptídeos , Mapeamento de Interação de Proteínas/métodos , Relação Estrutura-AtividadeRESUMO
Many clinical isolates of Pseudomonas aeruginosa cause infections that are difficult to eradicate due to their resistance to a wide variety of antibiotics. Key genetic determinants of resistance were identified through genome sequences of 390 clinical isolates of P. aeruginosa, obtained from diverse geographic locations collected between 2003 and 2012 and were related to microbiological susceptibility data for meropenem, levofloxacin, and amikacin. ß-Lactamases and integron cassette arrangements were enriched in the established multidrug-resistant lineages of sequence types ST111 (predominantly O12) and ST235 (O11). This study demonstrates the utility of next-generation sequencing (NGS) in defining relevant resistance elements and highlights the diversity of resistance determinants within P. aeruginosa. This information is valuable in furthering the design of diagnostics and therapeutics for the treatment of P. aeruginosa infections.
Assuntos
Amicacina/farmacologia , Antibacterianos/farmacologia , Levofloxacino/farmacologia , Pseudomonas aeruginosa/efeitos dos fármacos , Tienamicinas/farmacologia , Amicacina/uso terapêutico , Antibacterianos/uso terapêutico , Técnicas de Tipagem Bacteriana , Sequência de Bases , DNA Bacteriano/genética , Farmacorresistência Bacteriana Múltipla/genética , Genoma Bacteriano/genética , Humanos , Levofloxacino/uso terapêutico , Meropeném , Testes de Sensibilidade Microbiana , Tipagem de Sequências Multilocus , Infecções por Pseudomonas/tratamento farmacológico , Infecções por Pseudomonas/microbiologia , Pseudomonas aeruginosa/genética , Pseudomonas aeruginosa/isolamento & purificação , Análise de Sequência de DNA , Tienamicinas/uso terapêutico , beta-Lactamases/genéticaRESUMO
The rapid, sensitive, and specific identification of infectious pathogens from clinical isolates is a critical need in the hospital setting. Mass spectrometry (MS) has been widely adopted for identification of bacterial pathogens, although polymerase chain reaction remains the mainstay for the identification of viral pathogens. Here, we explored the capability of MS for the detection of human metapneumovirus (HMPV), a common cause of respiratory tract infections in children. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) sequencing of a single HMPV reference strain (CAN97-83) was used to develop a multiple reaction monitoring (MRM) assay that employed stable isotope-labeled peptide internal standards for quantitation of HMPV. Using this assay, we confirmed the presence of HMPV in viral cultures from 10 infected patients and further assigned genetic lineage based on the presence/absence of variant peptides belonging to the viral matrix and nucleoproteins. Similar results were achieved for primary clinical samples (nasopharyngeal aspirates) from the same individuals. As validation, virus lineages, and variant coding sequences, were confirmed by next-generation sequencing of viral RNA obtained from the culture samples. Finally, separate dilution series of HMPV A and B lineages were used to further refine and assess the robustness of the assay and to determine limits of detection in nasopharyngeal aspirates. Our results demonstrate the applicability of MRM for identification of HMPV, and assignment of genetic lineage, from both viral cultures and clinical samples. More generally, this approach should prove tractable as an alternative to nucleic-acid based sequencing for the multiplexed identification of respiratory virus infections.
Assuntos
Metapneumovirus/química , Metapneumovirus/crescimento & desenvolvimento , Infecções por Paramyxoviridae/virologia , Proteoma/análise , Proteômica , Proteínas Virais/análise , Células Cultivadas , Cromatografia Líquida , Humanos , Metapneumovirus/genética , Metapneumovirus/isolamento & purificação , RNA Viral/análise , RNA Viral/genética , Espectrometria de Massas em TandemRESUMO
Antimonials are still the mainstay of treatment against leishmaniasis but drug resistance is increasing. We carried out short read next-generation sequencing (NGS) and comparative genomic hybridization (CGH) of three independent Leishmania major antimony-resistant mutants. Copy number variations were consistently detected with both NGS and CGH. A major attribute of antimony resistance was a novel terminal deletion of variable length (67 kb to 204 kb) of the polyploid chromosome 31 in the three mutants. Terminal deletions in two mutants occurred at the level of inverted repeated sequences. The AQP1 gene coding for an aquaglyceroporin was part of the deleted region and its transfection into resistant mutants reverted resistance to SbIII. We also highlighted an intrachromosomal amplification of a subtelomeric locus on chromosome 34 in one mutant. This region encoded for ascorbate-dependent peroxidase (APX) and glucose-6-phosphate dehydrogenase (G6PDH). Overexpression of these genes in revertant backgrounds demonstrated resistance to SbIII and protection from reactive oxygen species (ROS). Generation of a G6PDH null mutant in one revertant exhibited SbIII sensitivity and a decreased protection of ROS. Our genomic analyses and functional validation highlighted novel genomic rearrangements, functionally important resistant loci and the implication of new genes in antimony resistance in Leishmania.